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Preface 


The aim of this book is to provide detailed coverage of the topics in the new OCR AS and A Level 
Computer Science specification. 


The book is divided into twelve sections and within each section, each chapter covers material that can 
comfortably be taught in one or two lessons. Material that is applicable only to the second year of the full 
A Level is clearly marked. Sometimes this may include an entire chapter and at other times, just a small 
part of a chapter. 


Each chapter contains exercises and questions, some new and some from past examination questions. 
Answers to all these are available to teachers only in a free Teacher's Pack which can be ordered from 
our website www.pgonline.co.uk. 


This book has been written to cover the topics which will be examined in the written papers at both 

AS and A Level. Sections 10, 11 and 12 relate principally to problem solving skills, with programming 
techniques covered in sufficient depth to allow students to answer questions in Component 02. 
Pseudocode, rather than any specific programming language, is used in the algorithms given in the 
text. Sample Python programs which implement many of the algorithms are included in a folder with the 
Teacher's Pack. 


This resource is endorsed by OCR for use with specifications HO46/H446 AS Level Computer Science 
and A Level Computer Science. In order to gain OCR endorsement, this resource has undergone 

an independent quality check. Any references to assessment and/or assessment preparation are 

the publisher's interpretation of the specification requirements and are not endorsed by OCR, OCR 
recommends that a range of teaching and learning resources are used in preparing learners for 


from its sale. For more information about the endorsement process, please visit the OCR website, 
www.ocr.org.uk. 
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Section 1 


Components of a computer 


In this section: 


Chapter 1 Processor components 
Chapter 2 Processor performance 
Chapter 3 Types of processor 
Chapter 4 Input devices 

Chapter 5 Output devices 


Chapter 6 Storage devices 
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SECTION 1 —- COMPONENTS OF A COMPUTER 


Chapter 1 — processor components 


Objectives 


*¢ Describe the function of the ALU and Control Unit 

« Describe the Fetch-Execute cycle and the role of the following registers: 
o Program Counter 

Accumulator 

Memory Address Register 


Memory Data Register 


o oOo Oo @G 


Current Instruction Register 


The Central Processing Unit (CPU) 


The CPU, also known simply as the processor, has a number of different components which enable it to 
carry out its task of executing instructions. 


These components include: 
* control unit 
* buses 


¢ arithmetic/logic unit (ALU) 


* dedicated registers 


Control Unit 


The Control Unit controls and coordinates the activities of the CPU, directing the flow of data between 
the CPU and other devices. It accepts the next instruction, decodes it into several sequential steps such 
as fetching addresses and data from memory, manages its execution and stores the resulting data back 
in memory or registers. 


Buses 


A bus is a Set of parallel wires connecting two or more components of a computer. It typically consists of 
8, 16, 32 or 64 lines. 


The processor is connected to main memory by three separate buses. When the CPU wishes to access 
a particular main memory location, it sends this address to memory on the address bus. The data in that 
location is then returned to the CPU on the data bus. Control signals are sent along the control bus. 


In the figure below, you can see that data, address and control buses connect the processor, memory 
and |/O controllers. These three buses are known collectively as the system bus. Each bus is a shared 
transmission medium, so that only one device can transmit along a bus at any one time. 


Data and control signals travel in both directions between the processor, memory and I/O controllers. 
Addresses, on the other hand, travel only one way along the address bus: the processor sends the 
address of an instruction, or of data to be stored or retrieved, to memory or to an I/O coniroller. 
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Input and 
Processor 


Contro! Bus 


Data Bus 


Address Bus 


SPPPS HSS TSS SPSS SRS SRS SHES Ree eee 


System bus 


Direction of transmission along the buses 


Control bus 


The control bus is a bi-directional bus, meaning that signals can be carried in both directions. The data 
and address buses are shared by all components of the system. Control lines must therefore be provided 
to ensure that access to and use of the data and address buses by the different components of the 
system does not lead to conflict. 


The purpose of the control bus is to transmit command, timing and specific status information between 
system components. 


Control lines include: 


*« Bus Request: indicates that a device is requesting the use of the data bus 

e Bus Grant: indicates that the CPU has granted access to the data bus 

e Memory Write: causes data on the data bus to be written into the addressed location 
e Memory Read: causes data from the addressed location to be placed on the data bus 
e Interrupt request: indicates that a device is requesting access to the CPU 

* Clock: used to synchronise operations 


Data bus 


The data bus, typically consisting of 8, 16, 32 or 64 separate lines, provides a bi-directional path for 
moving data and instructions between system components. 


Address bus 


Memory is divided up internally into units called words. A word is a fixed size group of digits, typically 16, 
32 or 64 bits, which is handled as a unit by the processor, and different types of processor have different 
word sizes. 


Each word in memory has its own specific address. The address bus transmits the memory addresses 
of words that are used as operands in program instructions, so that the data can be retrieved and sent 
back to the processor. When an instruction has been performed and the result is to be stored at a 
particular memory location, it is transmitted via the data bus. 
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Arithmetic-Logic Unit (ALU) 


The ALU performs arithmetic and logical operations on the data. It can perform instructions such as ADD, 
SUBTRACT, MULTIPLY, DIVIDE on fixed or floating point numbers. It can also perform shift operations, 
shifting bits to the left or right within a register. It can carry out Boolean logic operations, comparing two 
values and using operators such as AND, OR, NOT, XOR. 


Registers 


Registers are special memory cells that operate at very high speed. Results of all arithmetic, logical 
or shift operations are temporarily stored in registers and there are typically up to 16 general purpose 
registers in the CPU. 


However, although most modern computers have many registers, some special-purpose processors 
still use a single accumulator, in order to simplify the design. The accumulator takes the place of the 
general purpose registers. For simplicity, we will assume that results of all operations carried out in the 
ALU are stored in a single register called the accumulator. 


Carrying out instructions one after the other requires many different pieces of information to be held. 
As well as the accumulator, there are several other special-purpose registers: 


¢ the program counter (PC), which holds the address of the next instruction to be executed. 
This may be the next instruction in a sequence of instructions, or, if the current instruction is a 
branch or jump instruction, the address to jump to, copied from the current instruction register (CIR) 
to the PC. 


e the current instruction register (CIR), which holds the current instruction being executed, divided 
into operand and opcode. 


e the memory address register (MAR), which holds the address of the memory location from which 
data (or an instruction) is to be fetched or to which data is to be written. 


¢ the memory data register (MDR), which is used to temporarily store the data read from or written 
to memory. It is also sometimes Known as the memory buffer register. 


A simplified diagram showing the connections between these registers is shown below. 


Special-purpose registers in the processor 
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The Fetch-Decode-Execute cycle 


The sequence of operations involved in executing an instruction can be divided into three phases — 
fetching, decoding and executing it. This cycle is repeated over and over as each instruction of the 
program is executed. 


How the registers are used in the Fetch-Execute cycle 
(Fetch phase) 


1. The address of the next instruction is copied from the program counter (PC) to the memory address 
register (MAR). 


2. The instruction held at that address is copied to the memory data register (MDR). Simultaneously, the 
content of the PC is incremented so that it holds the address of the next instruction. 


3. The contents of the MDR are copied to the current instruction register (CIR). 


(Decode phase) 


4. The instruction held in the CIR is decoded. The instruction is split into opcode and operand and 


the opcode is used to determine the type of instruction and what hardware to use to execute it. The 
operand holds either: 


e the address of the data to be used with the operation, which is then copied to the MAR, or 
e the actual data to be operated on, which will be copied to the MDR 


e the data to be operated on may be passed to the ALU/accumulator 


(Execute phase) 


5. The appropriate instruction/opcode is carried out on the operand. 
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Exercises 
1, (a) In the context of computer architecture, explain what is meant by the term bus. [2] 
(b) Name three control lines used by the control bus. [3] 
(c) What is the data bus used for? [2] 


2. Describe the purpose of each of the following parts of a computer. 
(i) Memory unit [3] 
(ii) ALU [3] 
OCR F451/01 Qu § June 2013 


3. The figure below shows an incomplete diagram of the components of a processor. 


(b) The figure below is an incomplete flowchart of the Fetch-Execute cycle. 


Describe the missing steps. [5] 


step 5 Execute Instruction 
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Chapter 2 — Processor performance 


Objectives 


* Describe the factors affecting the performance of the CPU: clock speed, number of cores, cache 
TA) e Understand the use of pipelining in a processor to improve efficiency 


« Understand how address and data bus size relates to assembly language programs 


Factors affecting processor performance 


The main factors affecting processor performance are: 


* Clock speed 
* The number of cores, or duplicate processors, linked together on a single chip 
« The amount and type of cache memory 


Clock speed 


The system clock generates a series of signals, switching between 0 and 1 several million times per 
second and synchronising CPU operations. Each CPU operation starts as the clock changes from 0 to 1 
(or in some systems from 1 to 0), and the CPU cannot perform operations faster than the clock cycle 
(the time the clock takes to go from 0 to 1 and back to 0). 


All processor activities begin on a clock pulse, although some activities may take more than one clock 
cycle to complete. One clock cycle per second = 1 Hertz (Hz), and clock speed is measured in Gigahertz 
(GHz), about 1 billion cycles per second. Typical speeds for a PC are between 2 and 4 GHz. The greater 
the clock speed, the faster instructions will be executed. 


Number of cores 
In a traditional computer (von Neumann machine), instructions are fetched and executed one at a time 
in a serial manner. However, many computers nowadays have multiple cores. A dual-core processor 
has two processors linked together in the same integrated circuit, and a quad-core computer has four 
linked processors. 


Each core is theoretically able to process a different instruction at the same time with its own 
fetch-execute cycle, making the processor two or even four times faster with a quad-core chip. 
However, although a dual-core processor has twice the power, it does not always perform twice as 
fast, because the software may not always be able to take full advantage of both processors. 


Amount and type of cache memory 
Cache is a small amount of expensive, very fast memory inside the CPU. When an instruction is fetched 
from main memory it is copied into the cache so if it is needed again soon after, it can be fetched from 
cache, which is much quicker than going back to main memory. As cache fills up, unused instructions or 
data still being held are replaced with more recent ones. 
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Main memory 


There are different “levels” of cache: 


e Level 1 cache is extremely fast but small (between 2-64KB) 
¢ Level 2 cache is fairly fast and medium-sized (256KB-2MB) 


® Some CPUs also have Level 3 cache 


A-Level only 


: Pipelining 


Pipelining is a technique used by some processors to improve performance. Without pipelining, the steps 
in the Fetch-Execute cycle take place one after the other. While the next instruction is being fetched, the 
ALU, the arithmetic part of the processor, is idle. 


Using pipelining, the computer architecture allows the next instructions to be fetched at the same time as 
the processor is performing arithmetic or logic operations, holding them in a buffer close to the processor 
until the instruction can be performed. 


Processor pipelining is sometimes divided into an instruction pipeline and an arithmetic pipeline. 

The instruction pipeline consists of the stages in which an instruction is moved through the processor, 
including its being fetched, buffered and then executed. The arithmetic pipeline represents the parts 
of an arithmetic operation that can be broken down and overlapped as they are performed. 


Pipelining is now common in microprocessors used in personal computers. Intel’s Pentium chip uses 
pipelining to execute as many as six instructions simultaneously. 


vA 
Words and word size 


Address bus 


Each word, or group of bytes, in memory has its own specific address. When the processor wishes 
to read a word of data from memory, it first puts the address of the desired word on the address bus. 
The width of the address bus determines the maximum possible memory capacity of 

the system. For example, if the address bus consisted of only 8 lines, then the maximum address 

it could transmit would be (in binary) 11111111 or 255, giving a maximum memory capacity of 256 
(including address 0). A system with a 32-bit address bus can address 2° (4,294,967,296) memory 
locations giving an addressable memory space of 4GiB. (This is the memory capacity of an average 
PC in 2016.) 
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Data bus 
The data bus transmits the data held in a word of memory, between processor components and memory. 
The largest operand (which is either an address or an actual value) that can be held in a word is therefore 
related to the size of the data bus. If the data bus is 16 bits wide, a word cannot hold an integer greater 
than 2'* -1, or more than two characters. A wider data bus can transmit larger values, or more characters 
at a time, or allow more bits per instruction. 


How this relates to assembly language 


The basic structure of a machine code instruction in a computer with a 16-bit word may take the format 
shown below: 


Operation code Operand(s) 


Basic machine operation Addressing 
mode 


In assembly language, the operation code (opcode) will be expressed as a mnemonic such as ADD, 
SUB, LDA (load into the accumulator) etc. With only six bits for the opcode, there cannot be more than 
2° different instructions. The operand has to be held in only 8 bits. This would clearly not be practical ina 
general purpose computer which is more likely to have a word size of 32, 64 or 128 bits. 


Exercises 


1. Name and describe briefly three of the main factors affecting processor performance. [9] 


2. The program below is written in a low-level language. 


AB2F ;Load value 2F into accumulator 

BCSD ;Store contents of accumulator at address 5D 
E402 ;Add value 2 to accumulator 

BCFF ;Store contents of accumulator at address FF 
AC61 ;Lead accumulator with contents of address 61 
BC4A ;Store contents of accumulator at address 4A 


(a) What is the name of this language? [1] 
(6) The machine for which this program is written has limited addressing capability. 


What are the highest and lowest memory addresses that can be addressed by this 
machine? [2] 


(c) What is the width of the address bus in this machine? [1] 
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Chapter 3 — Types of processor 


Objectives 


e Describe von Neumann, Harvard and contemporary processor architecture 
¢ Describe the differences between, and uses of, CISC and RISC processors 
@ + Describe GPUs and their uses 


e Describe multicore and parallel systems 


Memory and the stored program concept 


Computers as we know them were first built and developed in the 1940s and 50s, and two of the early 
pioneers were Alan Turing and John von Neumann. The von Neumann architecture specifies the basic 
components of the computer and processor in which a shared memory and bus is used for both data 
and instructions. 


The stored progam concept can be defined as follows: machine code instructions are fetched and 
executed serially by a processor that performs arithmetic and logical operations. 


¢ A program must be resident in main memory to be executed 


e The machine code instructions are fetched from memory one at a time, decoded and executed in 
the processor 


Virtually all computers today are built on this principle, and so the general structure as shown in the figure 
below is sometimes referred to as the von Neumann machine. 


The von Neumann machine 


The stored program concept 


In a von Neumann machine, the same data bus is used to transfer both data and instructions. 
Similarly, a single address bus is used to transfer the addresses of data and instructions. The same 
word length is used for all memory, whether it holds data or instructions. 
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Harvard architecture 


The Harvard architecture is a computer architecture with physically separate memories for instructions 
and data. Harvard architecture is used extensively with embedded Digital Signal Processing (DSP) 
systems. DSP applications include audio and speech signal processing, sonar and radar signal 
processing, biomedical signal processing, seismic data processing and digital image processing. 


* The two different memories can have different characteristics; for example, in embedded systems 
instructions may be held in read-only memory while data memory requires read-write memory 


e In some systems, there is much more instruction memory than data memory so a larger word size is 
used for instructions 


e the instruction address bus may be wider than the data bus 


Embedded systems include special-purpose computers built into devices often operating in real time, 
such as those used in navigation systems, traffic lights, aircraft flight control systems and simulators. 


Harvard architecture can be faster than von Neumann architecture because data and instructions can be 
fetched in parallel instead of competing for the same bus. 


Instructions fae Control hele Data 
memory unit memory 


I/O 


Harvard architecture 


Comparison of von Neumann and Harvard architectures 


Used in digital signal processing and in 
Used in conventional processors in PCs, servers embedded systems, mobile communication 
and embedded systems with only control functions | systems, audio, speech and image processing 
systems 


Instructions and data are held in separate 
Data and programs share the same memory recrrusvian 


Parallel data and instruction buses may be used 
Programs tend to be large 
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Contemporary processor architectures 


Modern high-performance CPU chips incorporate aspects of both von Neumann and Harvard 
architecture. In one design, there is one main memory for holding both data and instructions, but CPU 
cache memory is divided into an instruction cache and a data cache. Harvard architecture is used as the 
CPU accesses the cache. 


Some digital signal processors such as Texas Instruments TMS320 C55x have multiple parallel data 
buses (two write, three read) and one instruction bus. 


Complex Instruction Set Computers (CISC) 


In the older CISC architecture used by early generations of computer, a large instruction set is used to 
accomplish tasks in as few lines of assembly language as possible. The processor hardware is capable 
of understanding and executing the series of sub-tasks that make up a single instruction. Complex 
instructions are built into the machine's hardware, and the distinguishing feature of a CISC instruction is 
that it combines a “load/store” instruction with the instruction that carries out the actual calculation. 


For example, to multiply two values held in different memory locations A and B, storing the result in A, a 
processor using several general purpose registers would load each of the values into a separate register, 
carry out the multiplication and then store the result back in A. The assembly language instruction for a 
CISC processor might be written something like 


MULT A, B 


A CISC processor has in its instruction set a single instruction that will do the loading, multiplication and 
storing of the result. The instruction is equivalent to the high level instruction a = a * b. 


One advantage of CISC architecture is that the compiler has very little work to do to translate a high-level 
language statement into machine code. Because the code is relatively short, very little RAM is required to 
store the instructions. 


A disadvantage of CISC was that many specialised instructions had to be built into the hardware even 
though only about 20% of them were used in the average program. 


Reduced Instruction Set Computers (RISC) 


The opposite approach is adopted in the more modern RISC architecture. Only simple instructions, each 
taking one clock cycle, can be executed. Thus the multiplication instruction described above might 
be written: 


LDA Rl, A 
LDA Ré, B 
MULT Rl, R2 
STO Ril A 


The RISC strategy has the disadvantage that the compiler has to do more work to translate high-level 
code into machine code, and more RAM is required to store the machine code instructions. 
However, because each instructions takes the same amount of time, i.e. one clock cycle, pipelining is 
possible, and the four instructions will execute at least as fast as the single CISC instruction. 


RISC has largely replaced CISC as a processor design, but CISC is still used for microcontrollers and 
embedded systems. 
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Co-processor systems 


A co-processor is an extra processor used to supplement the functions of the primary processor 

(the CPU). It may be used to perform floating point arithmetic, graphics processing, digital signal 
processing and other functions. It may not be a general-purpose processor with the ability to fetch its 
own instructions, do input and output operations and so on. It generally carries out only a limited range 
of functions. 


Multi-core and parallel systems 


Multi-core CPUs are able to distribute workload across multiple CPU cores, thus achieving significantly 
higher performance. 


The IBM Blue Gene supercomputer has 4,098 processors, allowing 560 Teraflops of processing. 
Supercomputers are used on problems such as weather forecasting, running climate change models, 
processing Big Data or sequencing DNA. 


Many personal computers and mobile devices are dual-core or quad-core, meaning they have two or four 
processing chips. 


The improvement in performance gained by using a multi-core processor is dependent on the software 
being able to take advantage of the parallel processing capabilities. Maximizing the usage of the 
computing resources provided by multi-core processors requires adjustments both to the operating 
system and to existing application software. 


Det | 


Job despatcher 


Job scheduler 


Some browsers such as Google Chrome and Mozilla Firefox can run several concurrent processes, and 
a quad-core CPU-based mobile device will deliver higher performance than a single- or dual-core device. 


All four CPUs may operate when tabbed browsing is used, for example. 
Graphics processing unit (GPU) : 


A GPU is a specialised electronic circuit which is very efficient at manipulating computer graphics and 
image-processing. Whereas a CPU has a few cores optimised for sequential serial processing, a GPU 
has a massively parallel architecture consisting of thousands of smaller, more efficient cores designed 

for handling multiple tasks simultaneously. Its highly parallel structure makes it suitable for tasks where 
processing of large blocks of visual data is done simultaneously, i.e. in parallel. In a personal computer, a 
GPU may be present on a graphics card, or embedded on the motherboard. GPUs are now finding more 
generalised uses in computers used for applications such as machine learning, oil exploration, image 
processing and financial transactions. 
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A-Level only 
| A GPU is a form of co-processor, and can be used with a CPU to accelerate scientific, engineering, 


analytics and other applications, offloading compute-intensive parts of an application to the GPU while 
the remainder of the code runs on the CPU. From a user perspective, performance is significantly better. 


Computer-intensive Functions 
GPU 


10% of code 


CPU 


Rest of Sequential 
CPU code 


——_* 


Application code 


oi 


Case study: DeepMind AlphaGo program 
Go is a Chinese game of far greater complexity than chess. 


In March 2016 world champion Go player Lee Se-dol from South Korea was defeated by Google's 
DeepMind Adah pagent: | This was the first time a computer had been able to beat a human 
player at the game. DeepMind started by taking 
a huge database of professional Go matches and 
training a program to try to predict what move 
would come next in any given situation. 


AlphaGo runs on Google’s cloud computer 
network, using 1,920 processors and a further 
280 GPUs. 


A —_ version of the program that uses only 48 


one Kenning, 
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Exercises 
1. (a) (i) Give the name of the computer architecture that uses the fetch-execute cycle with 
a single control unit. [1] 


(ii) Registers used during the fetch-execute cycle include the current instruction 
register (CIR}, memory address register (MAR), memory data register (MDR) and 
program counter (PC). 


Place ticks in the table to show which statements are correct during processing. [4] 


IR | MR | PC 
|Holdsabinaryvalue 
|Alwaysholds onlyanaddress | | 


May change more than once duringasinglecycle | | | | 
May pass a value to the MAR Pt 


(bo) (i) Compare a Complex Instruction Set Computer (CISC) architecture with a Reduced 


Instruction Set Computer (RISC) architecture. [4] 
(i) Explain one advantage, other than cost, of RISC compared with CISC. [2] 
(c) Some computer systems use co-processors, 


Explain the effect of using a co-processor system for each of the following applications. 
(i) Complex calculations for scientific research [2] 
(ii) Printing personalised letters to customers for an advertising campaign [2] 


OCR F453/01 June 2014 Qu 3 
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Chapter 4 — Input devices 


Objectives 


e Describe different input devices 
« Explain how different inout devices can be applied as a solution to different problems 


Barcodes 


Barcodes first started appearing on grocery items in the 1970s, and today they are used for identification 
in thousands of applications from tracking parcels, shipping cartons, passenger luggage, blood, tissue 
and organ products around the world to the sale of items in shops and the recording of the details 

of people attending events. Keeping track of anything accurately is now almost unimaginable 

without barcodes. 


A handheld barcode scanner used for scanning medical samples 


There are two different types of barcode: Linear barcodes such as the one shown above and 2D 
barcodes such as the Quick Response (QR) code, which can hold more information than the 1D barcode. 


A 2D barcode 


2D barcodes are used for example in ticketless entry to concerts, or access through gates to board 

a Eurostar train or passenger airline. They are also used in mobile phone apps that enable the user to 
take a photo of the code which may then provide them with further information such as a map of their 
location, product details or a website URL. 


Barcode readers 


There are four different barcode readers available, each using a slightly different technology for reading 
and decoding a barcode. The four types are pen-type readers, laser scanners, CCD readers and camera- 
based readers. 
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Pen-type readers 
In a pen-type reader, a light source and a photo diode are placed next to each other in the tip of a pen. 
To read a barcode, the tip of the pen is dragged across all the bars at an even speed. The photo diode 
measures the intensity of the light reflected back from the light source and generates a waveform that is 
used to measure the widths of the bars and spaces in the barcode. 


A pen or wand barcode scanner 


Because of their simple design, pen-type scanners are the most durable type of barcode scanner, and 
can be tightly sealed against dust, dirt, and other environmental hazards. However, their applications are 
limited because they must come into direct contact with a barcode to read it. 


Their small size and low weight makes this type of barcode scanner ideally suited for use with portable 
(laptop) computers or very low volume scanning applications. 


Laser scanners 
Laser scanners work in the same way as pen scanners except they use a laser beam as the 
light source. They are available in a variety of forms, the most familiar being the in-counter units 
in supermarkets. They are reliable and economical for low-volume applications. 


A laser scanner 
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Camera-based readers 


A camera-based imaging scanner uses a camera and image processing techniques to decode a 1D or 
2D bar code. An imaging scanner can read a barcode on any surface, printed or onscreen, and can also 
read a code that is damaged or poorly printed. They are used in multiple applications such as: 


* age verification by scanning an individual’s driving licence 


* couponing — a 2D barcode coupon is emailed to a customer, which can be scanned from their phone 
screen at the POS (Point of Sale). Unique codes for each customer and promotion can be stored in 
the bar code, so that tracking coupon usage is easy 


« event ticketing — tickets can be issued electronically and then scanned off a phone screen 


Consumers can use a cell phone to scan a QR code which can, for example: 


* display a catalogue of movies or DVDs 
e play an MP3 when scanned 


e® display nutrition information about a product 


Digital cameras 


A digital camera uses a CCD or CMOS (Complementary Metal Oxide Semiconductor) sensor comprising 
millions of tiny light sensors arranged in a grid. The binary data from each sensor is recorded onto the 
camera's memory card so that the image can be reproduced using suitable software at a computer. 


A CCD sensor tends to produce higher quality images and they are used in higher end cameras. They are 
also more reliable since the technology has been around for much longer. This however, is at the cost of 
power consumption, using up to 100 times that of a CMOS sensor. 


Bayer colour filter applied to a sensor array 


Mastercard is testing a new app that allows customers to make purchases online by taking a selfie rather 
than entering a password. Currently, Mastercard customers enter a password at the point of sale to 
verify their identity, but these can be forgotten, stolen or intercepted. 


Participants in the trial are prompted to take a photograph of their face using the Mastercard app, which 
is then converted to a binary code using facial recognition technology. This is then compared with a 
stored code and if the two match up, the purchase is approved. 
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Radio Frequency Identification (RFID) 


This technology uses both input and output - an input device to read the signal from an RFID chip, and 
output to transmit a signal from an active tag (see below). 


In much the same way as barcodes, RFID tags are increasingly being used to identify and track 
everything from household products and cars to bank cards and animals. The difference however, is that 
an RFID tag can be read without line of sight and from up to 300 metres away. They can also pass stored 
data from the tag to the receiver and vice versa. An RFID chip consists of a small microchip transponder 
and an antenna. The microchip at the centre of the image below can be manufactured to be less than 
1mm in size but the antenna must be larger in order for it to communicate with a base unit. This can 
increase the size of the smallest tags to about that of a large grain of rice. These can be embedded in 
special capsules and injected under the skin for the identification of pets. 


RFID chip 


Passive and active tags 


Active tags are physically larger as they include a battery to power the tag so that it actively transmits a 
signal for a reader to pick up. These are used to track things likely to be read from further away, such as 
cars as they pass through a motorway toll booth or runners in a marathon as they pass mile markers. 
Passive tags are much cheaper to produce as they do not have a battery. They rely on the radio waves 
emitted from a reader up to a metre away to provide sufficient electromagnetic power to the card using 
its coiled antenna. Once energised, the transponder inside the RFID tag can send its data to the reader 
nearby. These are most common in tagging items such as some groceries, music CDs, and for smart 
cards such as Transport for London's Oyster Card or a contactless bank card. 


Exercises 


1. Describe three different input devices that are used by police for crime detection and prevention. [6] 


2. Describe three different input devices used at a self-checkout in a supermarket, stating for what 
purpose each of them is used. [6] 
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Chapter 5 — Output devices 


Objectives 


e Describe how different output devices can be applied as a solution to different problems 


Output devices 


Output devices take data produced by the computer and turn it into a form that humans can understand. 
This could be, for example, written or spoken text, an image on a screen, music or a multimedia 
presentation. A different type of output device is an actuator, which might respond to an input signal to 
turn on a sprinkler, open or close windows in a greenhouse, or perform any number of other actions. 


Common output devices include screens, printers, multimedia projectors, speakers and actuators. 


Screens 
There are various different screen technologies used for computers, phones and other devices. 


LCD monitors 


Liquid crystal display (_CD) monitors contain groups of red, green and blue diodes to form each pixel. 
The screen is typically back-lit using light-emitting diodes (LEDs). These have several advantages over 
older technology: 


e they reach their maximum brightness almost immediately 


e the image is sharper with more realistic and vivid colours 


e they produce a brighter light which leads to better picture definition 
e since LEDs are very small, screens can be much thinner in construction 
*e they last almost indefinitely which makes the screens much more reliable 


e they consume very little power and therefore produce very little heat as well as reducing 
running costs 


Organic LED (OLED) screens 


These are brighter, thinner and lighter than traditional LCD or LED screens. The screen is plastic rather 
than glass so they are flexible. 
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OLED screens can be used wherever LCD screens are used, for TV and computer screens, MP3 and 
cell phone displays. In the future they may be used to make inexpensive animated billboards, super-thin 
pages for electronic books and magazines, as paintings on a wall that can be updated from a computer 
or even in clothes — so-called “wearable technology”. 


They have many advantages over LCDs: 


* when made of plastic rather than glass, they are theoretically flexible enough to print onto clothing 
e they are much thinner 


* they are brighter and need no backlighting, so they consume less power, which translates into longer 
battery life in a portable device 


« LCDs can be slow to refresh (a problem in fast-moving sports or computer games), OLEDs respond 
up to 200 times faster 


e they produce truer colours through a much bigger viewing angle, unlike LCDs where the colours 
darken and disappear if you look from the side 


One drawback is that OLEDs do not last as long, tending to wear out around four times faster than 
LCDs. They are also very sensitive to water, which is a potential problem in a cellphone. 


Printers 


Laser printers 


Laser printers offer high-quality, high-speed printing. Their function is similar to that of a photocopier, 
using powdered ink called toner. 


This type of printer is becoming increasingly affordable and is frequently used as a home printer, in 
businesses and in professional printing services. Colour laser printers are far more expensive to run 
than black and white versions. They contain four toner cartridges (Cyan, Magenta, Yellow and Black or 
CMYK) and the paper must go through a similar process to the black-only printer four times; once for 
each colour. 


The usage of laser printers for print jobs other than text is limited by the quality of the print produced, 
which at about 1200 dpi makes photorealistic prints impossible and best left to inkjet printers. 
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Inkjet printers 


Inkjet printers work by spraying minute dots of ink onto paper to create an image. Depending on the 
resolution (dots per inch) of the model, the number of colour cartridges used and the quality of the 
paper being used, they can produce excellent, photo-realistic images. They are cheaper than laser 
printers but much slower, and the ink cartridges have to be replaced quite frequently. 


Given the choice, it is preferable to use a laser printer when a lot of text needs to be printed, and an 
inkjet printer to produce high quality photographic images. 


Dot matrix printers 


These are known as impact printers. The print head has a matrix of pins which strike the surface of 

the paper through an inked ribbon to form letters. These printers are useful when multi-part stationery is 
required, and they can operate in damp or dirty environments. However, they are noisy, slow and the print 
quality is poor. 


3-D printers 
3D printers have been used to create car and aeroplane parts, medical equipment, prosthetic limbs, 


fashion accessories and a multitude of other items. They have even, controversially, been used to 
produce working firearms and other weapons. 


They are used for creating spare parts for obsolete equipment and to produce prototypes of new 
products. They can be used in many situations where a one-off item is required, for example to fill in the 
missing parts of a dinosaur skeleton or a 2000-year old artefact. 


Multimedia projectors 


What are the benefits of using a multimedia presentation in a classroom? There are many benefits both to 
teachers and students: 


« inthe bad old days 20 or more students would crowd around a desk trying to catch a glimpse of 
what the teacher was demonstrating on a 16" screen. 


* copying down notes written on a chalkboard or whiteboard was a chore 
*« having an image to focus on while the teacher is explaining something can aid concentration 


* watching educational videos or even live webcams adds interest to the lesson 
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From the teacher's point of view, being able to prepare the lesson in advance and deliver it to several 
different groups without having to write the same thing on the board every lesson, means the lessons are 
consistent in quality. With the aid of a projector, the teacher can present text, graphics, audio and video 
on the screen, display images or videos from the Internet, display PC applications or programs, and use 
the screen interactively, adding impact to every lesson. 


Multimedia projectors are now viewed as essential classroom tools. 


Computer speakers 


PCs, smartphones and other portable devices generally have a basic inbuilt soeaker which can be used 
to output music, voice or sound tracks from a video. High quality speakers can be bought separately and 
when in use, they disable the inbuilt soeakers. 


Apart from playing music and video soundtracks, uses include giving verbal instructions in a sat-nav 
system, reading text from the screen for visually impaired people, giving warning beeps and notification 
alerts (e.g. when you receive an email). 


Actuators 


Actuators are motors that are commonly used in 
conjunction with sensors to control a mechanism, 
for example: 


* opening a window or valve 
e starting or stopping a pump 
e turning a wheel 


* moving an aircraft aileron 


*« controlling devices in a “smart home” 


SECTION 1 —- COMPONENTS OF A COMPUTER 


Exercises 


1. Computer software is used in Geography lessons to teach students about weather systems. 


(a) (i) State the purpose of an input device in a computer system when using 
this software. [1] 


(i) State the purpose of an output device in a computer system when using this software. [1] 
(b} Describe how the following forms of output will be used by the software. 
(i) Animation [2] 
(ii) Interactive presentation [2] 
OCR June 2014 F451-07 Qu 4 
2. State, with reasons, what type of printer you would recommend for the following applications: 
(a) Invoice/delivery note printed on 3-part paper with 2 carbon copies. [3] 


(b) Flyers produced by a small window-cleaning business to be delivered to all homes in a 
particular area. [3] 


(c) Producing high-quality prints of a set of photographs. [3] 


3. What type of screen would you recommend for an in-flight entertainment system? 
Give reasons for your choice. [5] 
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Chapter 6 — Storage devices 


Objectives 


*« Know the main uses of magnetic, flash and optical storage devices 
« Describe the uses of and differences between RAM and ROM 


e Describe what is meant by virtual storage 


The need for secondary storage 


A computer’s primary store is Random Access Memory. Unlike RAM, secondary storage is not directly 
accessible to the processor and has slower access speeds. Secondary storage, however, has the 
advantage that it retains its contents when the computer's power is turned off. This includes the 
computer's internal hard disk, optical media and solid state disks. 


How storage devices store data 


Hard disks, optical disks and solid state disks all use different methods to store data, but in each case, 
use a technique which allows them to create and maintain a toggle state without power to represent 
either a 1 or aO. 


Hard disk 


A hard disk uses rigid rotating platters coated with magnetic material. Ferrous (iron) particles on the disk 
are polarised to become either a north or south state. This represents 0 and 1. The disk is divided into 
tracks in concentric circles, and each track is subdivided into sectors. The disk spins very quickly at 
speeds of up to 10,000 RPM. Like an old record player, a drive head (like the needle on a record player) 
moves across the disk to access different tracks and sectors. Data is read from or written to the disk as 
it passes under the drive head. When the drive head is not in use, it is parked to one side of the disk in 
order to prevent damage from movement. A hard disk may consist of several platters, each with its own 
drive head. 


Track Sector 


Cluster of 
four sectors 


ee Read / Write 
Platters , head 


Although hard disks are less portable than optical or solid state media, their huge capacity makes ther 
very suitable for desktop purposes. Smaller, denser surface areas spinning under the read-write heads 
mean that newer 3.5 inch disks have capacities of up to 640GB. 
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Optical disk 


Optical disks come in three different formats: read-only (e.g. CD-ROM), recordable (e.g. CD-R) and 
rewritable (e.g. CD-RW. An optical disk works by using a high powered laser to “burn” (change the 
chemical properties of) sections of its surface, making them less reflective. A laser at a lower power is 
used to read the disk by shining light onto the surface and a sensor is used to measure the amount of 
light that is reflected back. A read-only CD-ROM disk pressed during manufacture has pits in its surface. 
Those areas that have not been pitted, are called lands. At the point where a pit starts or ends, light is 
scattered and therefore not reflected so well. Reflective and non-reflective areas are read as 1s and Os. 
There is only one single track on an optical disk, arranged as a tight spiral. 


A CD-ROM holds about 700MB of data, whereas a Blu-Ray disk (designed to supersede the DVD disk) 
can hold 50GB. These disks are the same size; their added capacity is owing to the shorter wavelength 

in the laser they use. This creates much smaller pits, enabling a greater number to fit in the same space 

along the track and also means that the track can be more tightly wound, and therefore much longer. 


Recordable disks use a reflective layer with a transparent dye coating that becomes less reflective when 
a spot laser “burns” a spot in the track. 


Rewriteable compact disks use a laser and a magnet in order to heat a spot on the disk and then set 
its state to become a 0 or a 1 using the magnet before it cools again. A DVD-RW uses a phase 
change alloy that can change between amorphous and crystalline states by changing the power 

of the laser beam. 
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Optical storage is very cheap to produce and easy to send through the post for distribution purposes. 
It can however be corrupted or damaged easily by excessive sunlight or scratches. 
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Solid-state disk (SSD) 


Solid state disks are packaged to look like hard disk drives, rectangular in shape and sized to match 
industry-standard dimensions for hard drives, typically 2.5 and 3.5 inches. 


A 480 GB solid state drive 


Inside, however, instead of platters and a read-write head, there is an array of chips arranged on a board. 
These components are put into the standard size “housing” so that they fit into existing laptops and 
desktop PCs. Solid state memory comprises millions of NAND flash memory cells, and a controller that 
manages pages and blocks of memory. Each cell works by delivering a current along the bit and word 
lines to activate the flow of electrons from the source towards the drain. The current on the word line 
however is strong enough to force a few electrons across an insulated oxide layer into a floating gate. 
Once the current is turned off, these electrons are trapped. The state of the NAND cell is determined 

by measuring the charge in the floating gate. No charge (with no electrons) is considered a 1 and some 
charge is considered a 0. 


Data is stored in pages (typically 4KiB each), grouped into blocks of say, 512KiB. NAND flash memory 
cannot overwrite existing data. The old data must be erased before data can be written to the same 
location, and although data can be written in pages, the technology requires the whole block to be 
erased. As writing to a specific block of NAND cells cannot be done directly, a separate block is created 
to mirror the data to be transferred to the solid state memory and the data is then written to the new 
block. The contents of the original block are marked as “invalid” or “stale” and are erased when the user 
wants to write new data to the drive. 
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Although capacity is still relatively low, solid state media have faster access speed than hard disks. With 
no need to move a read-write head across the disk, one piece of data can be accessed just as quickly as 
any other, even if they are not close together. 


SSDs consume far less power than traditional hard drives, meaning that in a laptop, for example, battery 
life is extended and they stay cooler. In addition, they are less susceptible to damage. 


They are also silent in operation, lighter and highly portable — all considerable advantages in personal 
devices such as mobile phones and MPs players for example. 
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RAM and ROM 


Computers have two kinds of internal memory: random access memory (RAM) and read-only 
memory (ROM). 


RAM is used to store programs and data that are currently being used. It is volatile, meaning that its 
contents are lost when the computer is switched off. 


ROM is used to hold information that needs to be permanently in memory. The bootstrap loader, for 
example, (the small program that starts up as soon as the computer is switched on and causes the 
operating system to be loaded) has to be held in ROM. In embedded systems such as the software 
inside a washing machine, vehicle or camera, for example, never changes so is held in ROM. 


Virtual storage 


In some cases, the computer's RAM may not be not large enough to store all these programs 
simultaneously, so the hard disk is used as an extension of memory — called virtual memory. MS Word 
may be open on your desktop but if you are not actually using it at a particular time, the operating system 
may copy the Word software and data to hard disk to free up RAM for the browser software, the VB 
compiler or whatever you as the user have requested. When you switch back to Word, the operating 
system will reload it into memory. 


Exercises 
1. (a) Describe how data is written to and read from a CD-R disk. 
(b) A school has archived all its students’ reports on to CD-R. Some years later, a copy of a 


particular student's reports is requested, Unfortunately it is found that the documents cannot 
be opened. 


Give two reasons why this may be the case. [2] 


2. If you are considering purchasing a high-end desktop or laptop you might be offered the option 
of a solid-state drive (SSD) rather than a traditional hard disk drive. 


(a) Describe briefly how a solid-state drive differs from a hard disk in its operation. [6] 


(b) Ignoring any differences in price and assuming that both drives have the same capacity, 
state four reasons why you might choose the solid-state drive. [4] 
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Systems software 


In this section: 


Chapter 7 Functions of an operating system 
Chapter 8 Types of operating system 
Chapter 9 The nature of applications 


Chapter 10 Programming language translators 
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Chapter 7 — Functions of an operating system 


Objectives 
¢ Understand the function and purpose of an operating system 
« Describe memory management (paging, segmentation and virtual memory} 
e Describe the role of interrupts 
e Describe the role of an Interrupt Service Routine (ISR) within the fetch-decode-execute cycle 
e Describe the need for processor scheduling algorithms 


¢ Describe scheduling algorithms: round robin, first come first served, multi-level feedoack queues, 
shortest job first and shortest remaining time 


What is an operating system? 


An operating system is a program or set of programs that manages the operations of the computer 
for the user. It acts as a bridge between the user and the computer's hardware, since a user cannot 
communicate with hardware directly. 


The operating system is held in permanent storage, for example on a hard disk. A small program called 
the loader is held in ROM. When a computer is switched on, the loader in ROM sends instructions to 
load the operating system by copying it from storage into RAM. 


Application 
software 


Hardware 


Functions of an operating system 


Regardless of whether the operating system is embedded within an mps player or is the latest version of 
Windows installed on a desktop computer, all operating systems share the same basic functions. 


An operating system disguises the complexities of managing and communicating with its hardware from 
the user via a simple interface. Through this interface, a user can naively tap away to complete their 
tasks, (loading, saving or printing for example}, oblivious to the actual operations taking place behind the 
scenes to support their actions. 


Apart from providing a user interface, the operating system has to perform the following functions: 
* memory management 

e interrupt service routines 

* processor scheduling 

* backing store management 


* management of all input and output 


We will look at what each of these functions involves. 
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Memory management 


A PC allows a user to be working on several tasks at the same time. ‘You may be listening to music via 
a streaming site such as Spotify, entering a Python program, checking your emails every so often and 
running Word so that you can document your program design. 


Each program, open file or copied clipboard item, for example, must be allocated a specific area of 
memory whilst the computer is running. Should a user wish to switch from one application to another 
in a separate window, each application must be stored in memory simultaneously. The allocation and 
management of space is controlled by the operating system. 


Paging and segmentation 


Paging and segmentation are two different techniques for making the optimum use of memory by 
splitting it into small sections. 


Using a paging system, memory is divided into fixed size pages of 4Kb each, and a process currently 
in memory may be held in several non-contiguous pages. Imagine a program which uses 15K of 
consecutive memory addresses — these logical memory locations may be physically stored in four 
separate pages anywhere within the physical memory space. A page table uses mapping to store a link 
between the physical memory address and the logical address space of each process. 
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Segmentation is the logical division of address space into varying length segments which depend on 
the program structure. As with paging, it is possible to load only a part of a program into memory initially. 


Virtual memory 


Memory is not limitless, so as more and more jobs are loaded into memory, the operating system may 
swap pages of temporarily inactive jobs out to disk, thus using secondary storage as an extension of 
memory to make room for the next job which has a share of processor time. 


lf a large number of jobs are loaded and the computer has insufficient memory, you may notice a 
deterioration in performance as pages are swapped in and out of RAM, to the point where the operating 
system is spending most of its time swapping pages in and out and performance slows right down. 


On aPC you can look at “System Information” to see how much RAM and virtual memory is available. 


Installed Physical Memory (RAM) 4,00GB 
Total Physical Memory 3.25GB 


Available Physical Memory 1.12GB 
Total Virtual Memory 8.12GB 
Available Virtual Memory 3.83GB 
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Interrupts 


An interrupt is a signal from a software program, hardware device or internal clock to the CPU. 

A software interrupt occurs when an application program terminates or requests certain services from 
the operating system. A hardware interrupt may occur, for example, when an |/O operation is complete 
or an error such as ‘Printer out of paper’ occurs. 


Interrupts are also triggered regularly by a timer, to indicate that it is the turn of the next process to have 
processor time (see ‘Processor scheduling’ below). It is because a processor can be interrupted that 
multi-tasking can take place. 


Interrupt service routines 


When the CPU receives an interrupt signal, it suspends execution of the running program or process and 
disables all interrupts of a lower priority. It then puts the values of the program counter (PC) and of each 
register onto the system stack, while an Interrupt Service Routine is called to deal with the interrupt. 
Depending on the type of interrupt, a particular routine will be run in order to service it. 


Interrupts are assigned priorities, and lower priority interrupts may be disabled while a higher priority 
interrupt is being serviced. Examples of interrupts in descending order of priority, are given below: 


e Power-fail interrupt 
¢ Clock interrupt 
« An I/O device sends a signal requesting service or signalling end of I/O operation 


Once the interrupt has been serviced, the original values of the registers are retrieved from the stack and 
the process resumes from the point that it left off. 


A test for the presence of interrupts is carried out at the end of each fetch-decode-execute cycle. 
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Processor scheduling 


With computers able to run multiple applications simultaneously, the operating system is responsible 

for allocating processor time to each one as they compete for the CPU. While one application is busy 
using the CPU for processing, the OS can queue up the next process required by another application to 
make the most efficient use of the processor. A computer with a single processor can only process one 
instruction at a time, but by carrying out small parts of multiple larger tasks in turn, the processor can 
give the appearance of carrying out several tasks simultaneously. This is what is meant by multi-tasking. 


The scheduler is the operating system module responsible for making sure that processor time is used 
as efficiently as possible. Of course, this is a much more complex task on a large multi-user network 
where many users may, for example, be accessing the same database or running different applications. 
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The objectives of the scheduler are to: 

* maximise throughput 

* be fair to all users on a multi-user system 

* provide acceptable response time to all users 

* ensure hardware resources are kept as busy as possible 

There are many different scheduling algorithms and some of them are described below. 


Round robin 


In round robin scheduling, processes are despatched on a first in first out (FIFO) basis, with each process 
in turn being given a limited amount of CPU time called a time slice or quantum. If the process does not 
complete before its time expires, or before a higher priority interrupt occurs, the despatcher gives the 
CPU to the next process. 


In order to do this, the operating system sets an interrupting clock or interval timer to generate 
interrupts at specific times. This method of scheduling helps to guarantee a reasonable response time to 
all users of the system. In some systems, a system of priorities may allow a high priority job to have more 
than one consecutive time slice when their turn comes round 


Processor time shared — 
time slices 


First come first served 
Jobs are processed in the order in which they arrive, with no system of priorities. 


Shortest remaining time 
The process with the smallest estimated time to completion is run next. This tends to reduce the number 
of waiting jobs, and the number of small jobs waiting behind big jobs. Its disadvantage is that it requires 
knowledge of how long a job will take, so the user has to estimate the job time. This is possible for 
batch jobs such as payroll, which are performed regularly and usually run overnight, or for any scientific, 
commercial or other jobs which are run regularly. 


Shortest job first 


The process with the smallest estimated running time is run next. Its advantages and constraints are 
much the same as the ‘shortest remaining time’ algorithm. In a University environment, for example, 
students will get their short programs run quickly while large research or administration programs which 
are not time-critical will take longer to complete during busy periods of student activity. 


Multi-level feedback queues 
This algorithm is designed to: 
* give preference to short jobs 
* give preference to I/O bound processes 
* separate processes into categories based on their need for the processor 


The algorithm implements several job queues and jobs can move between queues, depending on how 
much processor time they use. Since input/output (I/O) is so much slower than processor speed, it is 
efficient to try and keep the I/O devices as continuously busy as possible, so that a bottleneck does not 
occur when several programs simultaneously need to send data to the printer, for example. While one job 
is printing, other jobs can use the processor. The aim is to maximise processor use. 
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Backing store management 


When files and applications are loaded, they are transferred from backing storage into memory. 

The operating system is required to Keep a directory of where files are stored so that they can be 

quickly accessed. Similarly, it needs to know which areas of storage are free so that new files or 
applications can be saved. The file management system that comes with your desktop operating system 
enables a user to move files and folders, delete files and protect others from unauthorised access. 


Peripheral management 


Different applications will require different input or output devices throughout their operation. If you send a 
file to print, the operating system will need to communicate with the printer to check that it is switched on 
and online, check that it is a printer and not, say, the keyboard and begin communication to send it the 
correct data to print. 


The data to be printed will then be transferred to an area of memory called a buffer, so that the CPU 
can continue with another task. The purpose of the buffer is to compensate for the difference in speed 
between the printer, or other output device, and the CPU. 


Exercises 


1. (a) An operating system uses interrupts which have priorities. 
Describe the sequence of steps which would be carried out by the interrupt handler when an 
interrupt is received and handled. [6] 


(b} The operating system of a personal computer supports multi-tasking. One of the operating 
system functions is memory management. 
Describe two different strategies which could be used to manage the available memory. [6] 


2. (a) An operating system uses scheduling. One method of scheduling is first come, first served. 


(i) Explain why the first come, first served scheduling method may not be efficient. [2] 
(ii) Describe one other scheduling method. [2] 
(iii) Explain why scheduling is necessary. [4] 
(b) Explain why memory management is necessary. [3] 
(c) Paging may be used in memory management. Describe paging. [3] 


OCR F453/01 Qu 1 June 2014 

3. (a) Describe what is meant by 
(i) an interrupt [2] 
(ii) a buffer [2] 


(b) A computer system includes a printer. 


(i) Explain the role of the printer buffer in the transfer of a job from the computer 
to the printer. [3] 


(ii) Explain why an interrupt is necessary during the transfer of data from the computer 
to the printer. [3] 


OCR F451/07 Qu 7 June 2013 
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Chapter 8 — Types of operating system 


Objectives 
e Describe distributed, embedded, multi-tasking, multi-user and real-time operating systems 


« Describe BIOS, device drivers and virtual machines 


Distributed operating systems 


A distributed operating system is a form of parallel processing system which spreads the load over 
multiple computer servers. A single job is split up into several tasks and each of these is run ona 
separate computer, coordinated by the operating system, in such a way that it appears to a user to be 

a single system. Intranets, for example, may use a distributed system, in which the system is configured 
as a cluster of servers that share memory and tasks, providing more power than a single large server and 
resulting in better performance. 


Linux, Unix and Windows are all available in distriobuted versions. 


_ FL 
Dean 


Router / firewall <——"\ 
_ ais, 


Internet 


print email web 
server server server 


A multi-tasking system 


A multi-tasking operating system may run on a standalone computer such as a PC or laptop. 

The Windows operating system, for example, can run many jobs simultaneously, switching between 
them so that each one appears to be the only one running. You may be playing music, entering a Python 
or VB program, and checking your emails occasionally. At any one time if you look at the Task Manager 


(press Ctrl-Shift-Esc) you will probably find it has several programs in memory, most of which are not 
currently executing. 
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A multi-user, multi-tasking system 


Time-sharing systems are multi-user, multi-tasking systems. A single powerful mainframe or 
supercomputer is connected to dozens or hundreds of terminals all using the mainframe CPU. Each user 
gets a slice of processor time according to a scheduling algorithm, as described in the last chapter. 


Terminal 1 # Terminal 2 & Terminal 3 


Operating systems used by mobile phones 


A mobile phone is a multi-tasking computer that has its own operating system. Operating systems 

used on smartphones, tablets, PDAs and other mobile devices are termed mobile operating systems. 
They combine the features of a personal computer operating system with their own special features 
useful for mobile use such as managing cellular and wireless connectivity as well as phone access. 


Typically, for example, smartphones respond to the user’s touch — the user can tap on the screen to 
open a program, pinch their fingers together to minimise or enlarge a screen, or swipe across the screen 
to change pages. They also have features useful for mobile systems such as GPS mobile navigation, 
camera, video camera, speech recognition, music player. 


Most mobile operating systems are tied to specific hardware. Smartphones have two operating 
systems — the main system operating the user interface and running the application software 

and a second, low-level proprietary real-time operating system which operates the radio and 
other hardware. These low-level systems have a range of security vulnerabilities permitting others 
to gain control over a mobile device. 
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Embedded operating systems 


Embedded systems are found in all kinds of hardware from a washing machine or microwave oven, to the 
control system of a passenger aircraft or a space shuttle. Clearly the requirements will vary accordingly. 


First, let's look at the simple case of a basic household appliance in which the application program is held 
in ROM. The main features of the operating system are: 


e it will have a minimal user interface, probably consisting of a few buttons or a dial and maybe a 
small screen 


* it will accept input from sensors, and send output to control devices 
e there is a limited amount of RAM so a complex memory management system is not required 


* there will not be any permanent data storage devices to be managed 


Real-time operating system 


\What about the operating system in the flight-control system of a “fly-by-wire” airliner such as the Airous 
320? This is a real-time, embedded system. 


The operating system on the aircraft or similar safety-critical system must have the following features: 
* it must respond very quickly to any inputs or sensors 
* it must be able to deal with many inputs simultaneously 


*« it must have “failsafe” mechanisms designed to detect and take appropriate action if a hardware 
component fails 


* it must incorporate redundancy -— that is, if one component fails, it must automatically switch to 
backup hardware 


of 
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BIOS (Basic Input Output System) 


BIOS is the program stored in EPROM (Erasable Programmable Read-Only Memory) that gets your 
computer started after you turn it on. 


The fundamental purpose of BIOS in modern PCs is to initialise and test the system hardware 
components and to load the operating system (or the key parts of it) from the hard disk into RAM. 
BIOS was historically used to provide an abstraction layer which allowed a consistent way for 
application programs and the operating system to interact with input-output devices. 


In more modern computers BIOS is not used after loading the operating system. 


Device drivers 


A device driver is a computer program that provides a software interface to a particular hardware device, 
This enables operating systems to access hardware functions without needing to know the details of 
the hardware being used. When you attach a new printer to your computer, for example, you will have 
to install the device driver program that comes with it before it will work. Sometimes the OS will do this 
automatically if it detects that the printer is one for which it already has a driver. 


Drivers are hardware dependent and operating system specific. A driver typically communicates with 
the device via the system bus or communications subsystem to which the hardware connects. When 
a calling program invokes a routine in the driver, the driver issues commands to the device. Once the 
device sends data back to the driver, the driver may invoke routines in the original calling program. 


Virtual machine 


A virtual machine can be defined as any instance where software is used to take on the function of the 
machine, including executing intermediate code or running an operating system within another to emulate 
different hardware. 


Exercises 


1. List four features of the user interface which you would expect to find on a smartphone 
but not on a PC. [5] 


2. Compare and contrast the functions of operating systems designed for a personal computer 
and a satellite-navigation system in a car. In this question you will also be assessed on your 
ability to use good English and to organise your answer clearly in complete sentences, 
using specialist vocabulary where appropriate. [7] 
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Chapter 9 — The nature of applications 


Objectives 
« Distinguish between systems software and applications software 
« Describe what is meant by a utility program and give examples 
e Be able to justify a suitable application for a specific purpose 


« Distinguish between open source and closed source software 


Categories of software 


Software may be grouped into separate categories, illustrated in the figure below. 


Software 


Systems Applications 
Software Software 


Operating Utility Off-the Custom 
Systems Programs -Shelf Written 


Library 
Programs 


Open 


Translators Proprietary Source 


Classification of software 


Software can be broadly classified into systems software and applications software. 


Systems software 


System software is the software needed to run the computer's hardware and application programs. 
This includes the operating system, utility programs, libraries and programming language translators. 
Libraries and programming language translators will be considered in the next chapter. 


Operating system 


In the last two chapters we looked at different types of operating system and the function of an operating 
system. The OS is a set of programs that lies between applications software and the computer hardware, 
and has many different functions, including: 


* resource management — managing all the computer hardware including the CPU, memory, disk 
drives, keyboard, monitor, printer and other peripheral devices 


* provision of a user interface (e.g. Windows) to enable users to perform tasks such as running 
application software, changing settings on the computer, downloading and installing new 
software, etc. 
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Utility programs 


Utility software is system software designed to optimise the performance of the computer or perform 
tasks such as backing up files, restoring corrupted files from backup, compressing or decompressing 
data, encrypting data before transmission, providing a firewall, etc. 


Disk defragmentation 


A disk defragmenter is a program that will reorganise a magnetic hard disk so that files which have 
been split up into blocks and stored all over the disk will be recombined in a single series of sequential 
blocks. This makes reading a file quicker. The software utility Optimise Drives, previously called Disk 
Defragmenter, runs automatically on a weekly schedule on the latest versions of Windows. You can also 
optimise drives on your PC manually. 


Automatic backup 


Several free automatic backup utilities are available for personal and commercial use. An automatic 
backup utility will allow the user to specify 


¢ Where you want to store the backup (the destination) 

e What you want to backup (the sources) 

¢ How you want to run the backup (using full backup that zips the files, or mirror backup that doesn't 
zip them) 

¢ When you want to run the backup (you can schedule it to run automatically or run it manually) 


You can then run the backup manually (typically by using a function key) or schedule it to run 
automatically. (See for example htto://www.fbackup.com/ } 


Automatic updating 
An automatic update utility makes sure that any software installed on the computer is up-to-date. For any 
software already installed on the computer, the automatic update utility will regularly check the Internet 
for updates. These will be downloaded and installed if they are newer than the version already on 
the computer. 


Firewalls and antivirus software must be updated regularly as new viruses and threats are constantly 
being devised and discovered. 


Application software should also be updated as there will be bug fixes and improvements that become 
available to people with a licence for that package. 


Virus checker 


A virus checker utility checks your hard drive and, depending on the level of protection offered, 
incoming emails and internet downloads, for viruses and removes them. Windows 8.1 comes with built-in 
virus protection called Windows Defender. 
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Compression software 
Several utility programs are supplied as part of the operating system. These include utilities to copy, move 
and delete files, create, move and delete folders, provide screensavers, Other utility programs such as 
WinZip for compressing and sharing files have to be purchased from independent suppliers. 


Zipped or compressed files can be transmitted much more quickly over the Internet. Sometimes there is 
a limit to the size of a file which can be transmitted — if you have a 15Mb photograph, you will not be able 
to email it to a friend if there is a 5Mb limit on the attacnments they can receive. Even if they can receive 
the file, it may take several minutes to download if they do not have a broadband connection. 


Applications software 


Applications software can be categorised as general-purpose, special-purpose or custom-written 
(bespoke) software. 


General-purpose software such as a word-processor, spreadsheet or graphics package, can be used 
for many different purposes. For example, a graphics package may be used to produce advertisements 
or animations, manipulate photographs, draw vector or bitmapped images. 


Special-purpose software performs a single specific task or set of tasks. Examples include payroll 
and accounts packages, hotel booking systems, fingerprint scanning systems, browser software and 
hundreds of other applications. Software may be bought “off-the-shelf", ready to use, or it may be 
specially written by a team of programmers for a particular organisation. If, say, a hotel wants to buy 
some visitor booking software, they may be able to find a ready-made package that is quite suitable, or 
they may want a bespoke software package that will satisfy their particular requirements. 


“Off-the-shelf” vs bespoke software 


Led ener Siete use schaed amece el expensive since the cost is shared among all liters costly and requires expertise to analyse 
the other people buying the package document requirements 


May contain a lot of unwanted features, and some _ | Features customised to user requirements and other 


desirable but non-essential features may be missing | features can be added as needs arise 
Ready to be installed immediately May take a long time to develop : 


Well documented, well-tested and error-free May contain errors which do not surface immediately 
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Open source vs closed source 
Open Source software is governed by the Open Source Initiative that says: 
« Software is licensed for use but there is no charge for the licence. Anyone can use it. 
e Open Source software must be distributed with the source code so anyone can modify it. 
e Developers can sell the software they have created. 


e Any new software created from Open Source software must also be “open”. This means that it must 
be distributed or sold in a form that other people can read and also edit. 


NB: This is different from Freeware (free software) which may be free to use but the user does not get 
access to the source code. Freeware usually has restrictions on its use as well. 


Closed source or proprietary software is sold in the form of a licence to use it. 


e There will be restrictions on how the software can be used, for example the licence may specify only 
one concurrent user, or it may permit up to say, 50 users on one site (site licence). 


¢ The company or person who wrote the software will hold the copyright. The users will not have 
access to the source code and will not be allowed to modify the package and sell it to other people. 
This would infringe the copyright (Copyright, Designs and Patents Act). 


The benefit of using proprietary software is the support available from the company. There will 

be regular updates available and technical support lines, training courses and a large user base. 

Open Source software tends to be more organic — it changes over time as developers modify 

source code and distribute new versions. There isn't a commercial organisation behind the software so 
there probably won't be a helpline or regular updates, just a community of enthusiastic developers. 


Selecting an application 


How would you select suitable software for a particular purpose? You might use some of the following 
criteria: 


e Does it provide all the necessary functionality? 
e Does it run on the available hardware? 
e sit available “off the shelf" or will it have to be specially written? 


* How much will it cost? 


e sit well-used, tried and tested? 
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Exercises 
1. (a) Software can be classified as either system or application software. What is meant by 
(i) system software? [1] 
(ii) application software? [1] 
(b) Give an example of each type of software. [2] 


2. Acompany sells widgets via an online web store. The process of updating the website and 
processing sales involves many different types of software. 


Below is a list of software: 


Operating system, Utility software, Special-purpose software, General purpose 
application software, Bespoke software 


Complete the table below by writing one software category beside each use. You should 
not use a category more than once. [4] 


Store's own online ordering system designed for 


their products and systems 


Graphics software to crop product images suitable 
for uploading to the site 


Online payment verification software 


3. Describe three reasons why a company might choose to purchase an “off-the-shelf” special 
purpose software package rather than a suite of programs written specifically for their needs. [6] 


4, A student owns a computer which he uses for: 
* producing project work in hard copy form 
¢« playing games with friends on the internet 
¢ downloading video and music files 
He uses a number of pieces of utility software. 


State the purpose of each of the following types of utility software and describe how the 
student would use them. 


(i) Compression software [3] 
(i) Anti-virus software [3] 
(iii) Backup utility [3] 


OCR F451/01 Qu 8 June 2073 
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Chapter 10 - Programming language translators 


Objectives 
e Understand the role of an assembler, compiler and interpreter 


e Explain the difference between compilation and interpretation, and describe situations when both 
would be appropriate 


e Explain why an intermediate language such as bytecode is produced as the final output by some 
compilers and how it is subsequently used 


A) e Describe the stages of compilation: lexical analysis, syntax analysis, code generation and optimisation 
A) e Describe the function of linkers and loaders 


@)* Describe the use of libraries 


Assembler 


Assembly code is a low-level language, with each instruction in assembly code almost always being 
equivalent to one machine code instruction. The machine code instructions that a particular computer 
can execute (the instruction set) are completely dependent on its hardware, and therefore each different 
type of processor will have a different instruction set and a different assembly code. 


Typically, several lines of low-level code instructions are required to achieve the same result as a single 
line of high-level code. 


Before an assembly code program can be executed, it must be translated into the equivalent machine 
code, or an intermediate form called bytecode. This is done by a program called an assembler. 

The assembler program takes each assembly code instruction and converts it to the Os and 1s of the 
corresponding machine code instruction. The input to the assembler is called the source code and the 
output (machine code) the object code. 


Compiler 


A compiler is a program that translates a high-level language such as Visual Basic, Python etc. into 
machine code. The code written by the programmer, the source code, is input as data to the compiler, 
which scans through it several times, each time performing different checks and building up tables of 
information needed to produce the final object code. Different hardware platforms will require different 
compilers, since the resulting object code will be hardware-specific. For example, Windows and the Intel 
microprocessors comprise one platform, Apple and PowerPC processors another, so separate compilers 
are required for each. 


The object code can then be saved and run whenever needed without the presence of the compiler. 


PROCESS 
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Interpreter 


An interpreter is a different type of programming language translator. Once the programmer has written 
and saved a program, and instructs the computer to run it, the interpreter looks at each line of the source 
program, analyses it and, if it contains no syntax errors, translates it into machine code and runs it. 


For example, the following Python program contains an error at line 5. 


1 ae=l 

2 b=2 

3 c=art+tb 

4 print("a+b=", c} 
5 @e=a-n 

6 print("a - b=", e) 


7 print ("goodbye") 
When the program runs, it produces the following output: 
a+be= 3 
Traceback (most recent call last): 
File "C:/Users/A Level sample programs/progl.py", line 5, in <module> 
e=a-n 
NameError: name ‘n’ is not defined 
The program produces output at line 4, gets as far as line 5 and then crashes. 


However, it is not always quite that simple. If we modify the program to introduce a syntax error at line 6, 
(missing closing bracket) the interpreter does not attempt to run any of the program until this is fixed. 


l1ae-=l 

2 b=2 

3 c=at+b 

4 print("a+b=", c)} 
5 @=a-b 

6 print("a -b=", e 
7 print ("goodbye") 


When the program runs, it does not execute any of the code but produces the following output: 


‘File Edit Format Run Options Windows Help 
ai 
ea 7% — SyntaxError 1 LED 


(cc = a+b 
print ("a + b= ", c) 


e = a-b os = 
invalid syntax 
|print("a-b=*", e [x] 


BRR ("goodbye") 
_ «| 


From this we can deduce that the translator has scanned through the whole program checking for certain 
types of error before executing any of it. 
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Bytecode 
Many languages are not on/y compiled or only interpreted — there are various possibilities in between. 


Interpreting each line of code just before executing it has become much less common. Most interpreted 
languages such as Python and Java use an intermediate representation which combines compiling 
and interpreting. The resulting bytecode is then executed by a bytecode interpreter. 


The bytecode may be compiled once and for all (as in Java) or each time a change in the source code is 
detected before execution (as in Python). 


A big advantage of bytecode is that you can achieve platform independence; any computer that 
can run Java programs has a Java Virtual Machine (JVM), a piece of software which masks inherent 
differences between different computer architectures and operating systems. The JVM understands 
bytecode and converts it into the machine code for that particular computer. 


A second advantage of using, for example, Java bytecode is that it acts as an extra security layer 
between your computer and the program. You can download an untrusted program and you then 
execute the Java bytecode interpreter rather than the program itself, which guards against any 
malicious programs. 


It is also possible to compile from Python into Java bytecode (using the Jython compiler) and then use 
the Java interpreter to interpret and execute it. 


Advantages and uses of compilers and interpreters 
A compiler has many advantages over an interpreter: 


e the object code can be saved on disk and run whenever required without the need to recompile. 
However, if an error is discovered in the program, the whole program has to be recompiled 


e the object code executes faster than interpreted code 


¢ the object code produced by a compiler can be distributed or executed without having to have the 
compiler present 

e the object code is more secure, as it cannot be read without a great deal of 'reverse engineering’ 

A compiler would therefore be appropriate when a program is to be run regularly or frequently, with only 

occasional change. It is also appropriate when the object code produced by the compiler is going to be 


distributed or sold to users outside the company that produced the software, since the source code is 
not present and therefore cannot be copied or amended. 


An interpreter has some advantages over a compiler: 


e platform independence - the source code can be run on any machine which has the appropriate 
interpreter available (e.g. Java's byte code) 


e itis useful for program develooment as there is no need for lengthy recompilation each time an error 
is discovered 


Disadvantages of an interpreter 


The program may run slower than a compiled program, because each statement has to be translated to 
machine code each time it is encountered. So if a loop of 10 statements is performed 20 times, all 10 
statements are interpreted 20 times. 
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There are three stages of compilation: lexical analysis, syntax analysis, and code generation and 
optimisation. These stages are described below. 


Stages of compilation 


Lexical analysis 
Lexical analysis performs the following functions. 


1. Superfluous spaces are removed. 


print (total mark, average) will be converted to 
print (total mark, average) 


2. All comments, identified for example by # or //, will be removed from the program. 
3. Some simple error-checking is performed, for example: 
¢ an illegal identifier (such as X&Y or ten in Python) would be flagged as an error 


¢ the lexical analyser will detect an attempt to assign an illegal value to a constant, such as a value 
of the wrong type or one that causes overflow or underflow 


(The lexical analyser will not detect misspelt keywords or undeclared variables; this is the job of the 
syntax analyser.) 


4. All Keywords, constants and identifiers (e.g. variable names) used in the source code are replaced by 
‘tokens’ (unique symbols). For example, numbers will be converted to their run-time representation, 
and identifiers will be replaced by a pointer to an address in the symbol table. Keywords such as 
input, print will be replaced by a single item-code. 


The symbol table 


The symbol table plays a central role in the compilation process. It will contain an entry for every keyword 
(reserved word) and identifier in the program. The exact format of the entries in the table will vary from 
compiler to compiler, but typically, entries in the table will show: 


e the identifier or keyword 

e the kind of item (variable, array, procedure, keyword etc.) 

e the type of item (integer, real, char etc.) 

*« the run-time address of the item, or its value if it is a constant 


* a pointer to accessing information (e.g. for an array, the bounds of the array, or for a procedure, 
information about each of the parameters). 


Typical entries in a symbol table might be as given below: 


item name | kind of item type of item run-time address or value pointer 


1 input keyword 

2 pi constant real 3.14159 

3 radius variable real (?)} 

‘| - operator 

2 area variable real 

6 numSides | array integer (?) (?) 
t * operator 

8 
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The statements input (radius) 

area = pi * radius * radius 
could be ‘tokenised' and stored as the lexical string 1 3 


6427373 


Note that the lexical analyser puts the identifier and its run-time address in the symbol table, so that it 
can replace them in the source code by ‘tokens’. It will not fill in the ‘kind of item’ and ‘type of item’; this 
is done later by the syntax analyser. 


Accessing the symbol table 


Since the lexical analyser soends a great proportion of its time looking up the symbol table, this activity 
has a crucial effect on the overall speed of the compiler. The symbol table must therefore be organised 
in such a way that entries can be found as quickly as possible. The most common way of organising the 
symbol table is a hash table, where the keyword or identifier is ‘hashed’ to produce an array subscript. 
As with any hash table, synonyms (collisions) are inevitable, and a common way of handling them is to 
store the synonym in the next available free space in the table. 


Syntax analysis and semantic analysis 


Syntax analysis is the process of determining whether the sequence of input characters, symbols, iterns 
or tokens form a valid sentence in the language. In order to do this, the language has to be expressed as 
a set of rules, using for example syntax diagrams or Backus-Naur form. 


Parsing is the task of systematically applying the set of rules to each statement to determine whether 

it is valid. Stacks will be used to check, for example, that brackets are correctly paired. The priorities of 
arithmetic operators will be determined, and expressions converted into a form (such as reverse Polish 
notation) from which machine code can more easily be generated. 


The semantics of the program will also be checked in this phase. Semantics define the meaning rather 
than the grammar of the language; it is possible to write a series of syntactically correct statements which 
nevertheless do not obey the rules for writing a correct program. An example of a semantic error is the 
use of an undeclared variable in Pascal, or trying to assign a real value to an integer variable, or using a 
real number instead of an integer as the counter in a for ... next loop. 


Code generation and optimisation 


This is the final phase of compilation, when the machine code is generated. Most high-level language 
statements will be translated into a number of machine code statements. 


Code optimisation techniques attempt to reduce the execution time of the object program by, for 
example, spotting redundant instructions and producing object code which achieves the same net 
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effect as that specified by the source program but not by the same means. The disadvantages of code 
optimisation are: ; 


e it will increase compilation time, sometimes quite considerably 


*« it may sometimes produce unexpected results. Consider the following program extract, which is 
supposed to measure the speed of the object program. Assume GetTime is a function which 
returns the current time set in the operating system: 


start = GetTime; 

for count = 1 to 100000 
x = 0 

#endfor 

finish = GetTime 

print (start, finish); 


The effect of code optimisation may be to detect that it is quite unnecessary to perform the loop 100000 
times to set x equal to 0, and optimise the code so that it is only done once! 


Linkers and loaders 


Once a program has been compiled, any separately compiled subroutines must be linked into the 
object code. These may be input or output routines, routines such as a random number generator or 
timer routine which are supplied with the language, or routines written by the programmer and held on 
disk in a library of subroutines. It is the job of the linker to put the appropriate machine addresses in 
all the external call and return instructions, so that the modules are linked together correctly. 


A relocating loader can load the object code anywhere in memory, provided the programmer has used no 
absolute addresses and the object code is in relocatable format. 


Use of libraries 


Library programs are ready-compiled programs, grouped in software libraries, which can be loaded and 
run when required. In Windows these often have a .dll extension. Most compiled languages have their 
own libraries of pre-written functions which can be invoked in a defined manner from within the 

user’s program. 


Advantages of library routines 
Most programming languages have extensive libraries of built-in functions such as chr(), ascii(), 
sart() etc. They also have libraries of modules that provide solutions to common problems in everyday 
programming, such as mathematical functions, generating random numbers in a specified range, 
providing a graphical user interface. These libraries can be imported into a user's program and have 
many advantages including: 


e they are tested and error-free 


e they save the programmer time in “re-inventing the wheel” to write code themselves to perform 
common tasks 
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1. A programmer is asked to write a program and can choose between using a low-level language or an 
imperative high-level language. 


Outline the major differences between these two types of languages, naming an example of each. 
For each language explain: 

* advantages and disadvantages of each one compared to the other 

e what translation software would be used, if applicable 


® asituation when each one would be the most appropriate choice [10] 


2. (a) When translating computer languages, intermediate code may be produced. 
Explain the need for intermediate code and its purpose in a virtual machine. 
The quality of written communication will be assessed in your answer to this question, [8] 


(b) State three benefits of using library routines when a program is written. [3] 


A-Level only 


3. The following source code is written in Python. It contains errors. 
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numbers = [9, 5, 4, 15, 3, 8, 11, 2] 
numItems = len(numbers) 
for i in range(numItems - 1): 


for j in range(numItems - i - 
if numbers[j] > numbers [j 
#swap the numbers 
temp = numbers[10]) 
numbers[j] = numbers[j + 1] 
numbers[j + 1] = tem 
drint (numbers) 


t ‘ 

a 
-_ 
— 


Using lines of code from the above program to illustrate your answer, state two things that would be 
done in each of the following stages of compilation: 


(a) Lexical analysis [2] 


(b) Syntax analysis [2] 


4, The process of compilation involves a number of stages. Name the stage at which each of the 
following would be detected. 


{a} An illegal identifier. [1] 
(b) An arithmetic operator is applied to an operator of the data type Boolean. [1] 
(c) An operand is omitted from an arithmetic expression. [1] 
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Software development 


In this section: 


Chapter 11 Systems analysis methods 
Chapter 12. Writing and following algorithms 
Chapter 13 Programming paradigms 


Chapter 14 Assembly language 
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Chapter 11 — Systems analysis methods 


Objectives 


e Describe the waterfall lifecycle model, agile methodologies, extreme programming, the spiral model 
and rapid application development 


e Describe the relative merits and drawbacks of different methodologies and when they might be used 


Aspects of software development 


There is an infinite variety of different tyoes of problem that can be solved using a computer. 

Whether you are developing a website for a new company selling goods or services, designing a 
simulation of a physics experiment, building a control system using a microprocessor, or something 
else, all software projects have certain aspects in common. Stages of analysis, design, implementation 
(which includes coding, testing and documentation), installation and evaluation are common to all 
projects, though they may not easily fall into these categories in some methodologies. The tasks 
performed at these stages in the old, traditional lifecycle methodology are briefly described below. 


Analysis 


Before a problem can be solved, it must be defined. The requirements of the system that solves the 
problem must be established. In the case of a data processing system, or for example the construction 
of a website, this could cover: 


¢ the data — its origin, uses, volumes and characteristics 


e the procedures — what is done, where, when and how, and how errors and exceptions are handled 
e the future —- development plans and expected growth rates 
* problems with any existing system 


In the case of a different type of problem such as a simulation or game, the requirements will still need to 
cover a similar set of considerations. 


Design 
Depending on the type of project, the systems designer may consider some or all of the following: 


* processing: the algorithms and appropriate modular structure for the solution, specifying 
modules with clear documented interfaces 


e data structures: how data will be held and how it will be accessed — for example in a dynamic data 
structure such as a queue or tree, or in a file or database 


* output: content, format, sequence, frequency, medium (e.g. screen or hard copy) etc. 
* input: volume, frequency, documents used, input methods 
* userinterface: screens and dialogues, menus, special-purpose requirements 


e security: how the data is to be kept secure from accidental corruption or deliberate 
tampering or hacking 


¢ hardware: selection of an appropriate configuration 
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Programming and testing 


Programming normally involves breaking the problem down into individual modules and further breaking 
these down until each module performs a single well-defined task. The program code is then written in 
the chosen programming language. 


Obviously a system must be thoroughly tested before being installed to make sure that all errors are 
discovered and corrected before going ‘live’. It is part of the designer's job to come up with a test plan 
which will ensure that all parts of the system are properly tested. 


There are several possible testing strategies. 


Black box testing (functional testing) 


Black box testing is carried out independently of the code used in the program. It involves looking at 
the program specification and creating a set of test data that covers all the inputs and outputs and 
program functions. 


White box testing (structural testing) 


White box testing is dependent on the code logic, and derives from the program structure rather than its 
function. The program code is studied and tests are devised which test each possible path at least once. 
The weakness of white box testing is that it will not detect missing functions — you cannot test what 

isn't there! 


Alpha testing 


Alpha testing is carried out by the software developer's in-house testing team. It is essential because it often 
reveals both errors and omissions in the system requirements definition. The user may discover that the 
system does not in fact have the required functionality because the requirements were not specified carefully 
enough, or because the developer has overlooked or misunderstood something in the specification. 


Beta testing 


When a new package is being developed for release as a software package, beta testing is often used. 
This involves giving the package to a number of potential users who agree to use the system and 
report any problems to the developers. Microsoft, for example, delivers beta versions of its products to 
hundreds of sites for testing. This exposes the product to real use and detects problems and errors that 
may not have been anticipated by the developers. The product can then be modified and sent out for 
further beta testing until the developer is confident enough in the product to put it on the market. 


Implementation 


Coding and testing will be carried out, errors traced and corrected. When all is thought to be satisfactory 
the software will be installed on the user’s system and more testing will be done. At this stage new 
weaknesses and omissions are almost bound to surface and more work will be carried out. 


Evaluation 


The evaluation may include a post-implementation review, which is a critical examination of the system 
three to six months after it has been put into operation. This waiting period allows users and technical 
staff to learn how to use the system, get used to new ways of working and understand the new 
procedures required. It allows management a chance to evaluate the usefulness of the reports and 
on-line queries that they can make, and go through several ‘month-end’ periods when various routine 
reports will be produced. Shortcomings of the system, if there are any, will be becoming apparent at all 
levels of the organisation, and users will want a chance to air their views and discuss improvements. 
The solution should be evaluated on the basis of effectiveness, usability and maintainability. 
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The post-implementation review will focus on the following: 


e acomparison of the system’s actual performance with the anticipated performance objectives 
* an assessment of each aspect of the system against preset criteria 
* errors which were made during system develooment 


* unexpected benefits and problems 


The waterfall lifecycle model 


The Waterfall Model illustrates the methodology described above, in which each step is completed one 
at a time from beginning to end. Each step has specific outputs that lead into the next step. It is possible 
to return to a previous stage if necessary but the model shows that the developers then have to work 
back down through the following stages. 


The user/customer is involved at the start of the process, in the Analysis stage, but then has little input 
until the Evaluation stage. 


This model was adopted from the manufacturing industry, where changes to hardware made later in the 
project had high cost implications to work already completed so it was important to get each stage right 
before moving to the next. Although still popular, it has been now superseded by more effective models. 


Analysis 
—————— 
Design 
—— Es 
Implementation 
= 
Evaluation 
—_ 


Maintenance 


Spiral model 


The Spiral Model uses the same structured steps but introduces the idea of developing the software in 
iterative (repeating) stages. At the start of the process the requirements are defined and the developers 

work towards an initial prototype. Each successive loop around the spiral generates a refined prototype 
until the product is finished. 
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Each time around the spiral the following activities are performed: 


* Analyse the requirements for the next prototype 

* Design the next version, the new prototype 

* Implement (code and test) the new prototype 

* Evaluate the new prototype, which generates a plan for the next iteration 


The Spiral Model is mostly used for large scale projects, for example, projects that take years to deliver. 
Smaller projects use a variation on this called the Agile Model (see below). 


Agile modelling 


At all the stages of analysis, design and implementation, an agile approach may be adopted, as the 
stages of software development may not be completed in a linear sequence. It might be that some 
analysis is done and then some parts of a system are designed and implemented while other parts are 
still being analysed and then, for example, implementation and testing may be intermixed. The developer 
may then go back to design another aspect of the system. 


Throughout the process, feedback will be obtained from the user; this is an iterative process during 
which changes made are incremental as the next part of the system is built. Typically the software 
developers do just enough modelling at the start of the project to make sure that the system is clearly 
understood by both themselves and the users. 


Requirement Quick Build 
gathering design prototype 


Create final Refine Customer 
system prototype evaluation 


At each stage, a prototype is built with user participation to ensure that the system is being developed in 
line with what the user wants. The success of the software development depends on 


« keeping the model simple, and not trying to incorporate features which may come in useful at a 
later date 


* rapid feedback from the user 


* understanding that user requirements may change during development as they are forced to 
consider their needs in detail 


* being prepared to make incremental changes as the model develops 


Extreme programming 


Extreme programming (XP) is a software development methodology which is intended to improve 
software quality and responsiveness to changing customer requirements. It is a type of agile software 
development in which frequent “releases” of the software are made in short development cycles. This is 
intended to improve productivity and introduce checkpoints at which new customer requirements can 
be adopted. 
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Rapid application development (RAD) 


Some very large projects may be developed over a long period of time during which both technology and 
user requirements change. Major changes at late stages of development can sometimes lead to projects 
being cancelled or restarted, at considerable cost. In response to this problem the RAD methodology 
was introduced, offering the promise of much faster completion of major projects. The ideas behind 

it include: 


* workshops and focus groups to gather requirements rather than a formal requirement document 


¢ the use of prototyping to continually refine the system in response to user involvement and feedback 


¢ producing within a strict time limit each part of the system, which may not be perfect but which is 
good enough 


¢ reusing any software components which have already been used elsewhere 


Relative merits and drawbacks of each development methodology 


The waterfall system lifecycle approach is suitable for very small projects which need careful 
supervision, such as those undertaken by students or trainees. The absence of user involvement is a 
serious drawback, 


The spiral model and the agile approach are an improvement in that they acknowledge that users 
often cannot specify their requirements accurately because they don't understand what is possible. It is 
much easier to examine a working prototype and figure out what needs to be done to it to turn it into a 
useful system. 


Extreme programming and rapid application development are good methodologies for large 
projects where there is a danger of getting bogged down or sidetracked by suggested improvements, so 
that developers are continually chasing a moving target. 


Exercises 


1. Asystems analyst/developer is planning a system for the administration of student courses to be 
used in an office in a college. 


Describe three tasks that may be carried out by the analyst to establish the requirements of 
the system. [6] 


2. (a) Explain what is meant by the prototyping/agile approach to system analysis and design. [A] 
(b) What are the advantages of this approach? [4] 
(c) (i) Describe briefly two other approaches to systems development. [6] 
(i) Describe the advantages and disadvantages of each of these approaches. [4] 
(iii) State circumstances in which each of the methods you have described 
would be appropriate. [2] 
3. Explain the difference between black box testing and white box testing. [4] 


56 


CHAPTER 12 — WRITING AND FOLLOWING ALGORITHMS 


Chapter 12 — Writing and following algorithms 


Objectives 
e Understand the term ‘algorithm’ 


e Learn how to write and interpret algorithms using pseudocode 


Properties of an algorithm 


A recipe for chocolate cake, a knitting pattern for a sweater or a set of directions to get from A to B, are 
all algorithms of a kind. 


Computational algorithms 
A good algorithm has the following properties: 
e thas clear and precisely stated steps that produce the correct output for any set of valid inputs 
e |t should allow for invalid inputs 
e It must always terminate at some point 
e it should execute efficiently, in as few steps as possible 


e It should be designed in such a way that other people will be able to understand it and modify 
it if necessary 


What kinds of problem are solved by algorithms? 


There are thousands of different practical applications of algorithms. Some of the best-known 
applications include: 


e Internet-related algorithms. Algorithms are used to manage and manipulate the huge amount 
of data stored on the Internet. How does a search engine find all the pages on which particular 
information resides in a fraction of a second? 


¢ Route-finding algorithms. Given two locations, how does a route-finder determine the shortest 
or best route between the two points? There may be thousands of possible routes. This type of 
algorithm is used not only for driving a vehicle from A to B, but also for many other applications, for 
example, finding the best route to transmit packets of data from A to B over a network. 


« Compression algorithms. These are used to compress data files so that they can be transmitted 
faster or held in a smaller amount of storage space. For example, MP3 files are compressed so that 
you can hold thousands of tracks on a mobile phone. 


* Encryption algorithms. When someone purchases something over the Internet and sends their 
credit card number and other personal details to the store, the data needs to be encrypted so that 
even if it is intercepted, it cannot be read. 
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A simple computational algorithm 


Suppose you are given an integer and you need to find its square root. Your calculator can add, subtract, 
multiply and divide but it does not have a square root function. You know that the answer is an integer. 


Here is one way of finding the square root of the integer number: 


1 n=0 jinitialise n 
2 nsquared = n*n 
3 Is nsquared = number? 
4 If yes, output n. If no, add 1 to n and repeat from step 2 
When you start to program, it is tempting to get straight to the computer and type in some code to solve 


a given problem. However, it will generally save time to figure out the steps needed using paper and 
pencil to write down a pseudocode algorithm before you start coding. 


Pseudocode is a way of expressing the solution in a way that can easily be translated into a 
programming language. 


The algorithm described will do the job, but a better solution is based on the well-known binary 
search algorithm. 


A “Divide and Conquer” algorithm 


The binary search algorithm uses the “Divide and Conquer” strategy to halve the search area every time a 
guess Is made. It goes like this: 


1. Set low to 1, high to number. Set guess = (low + high)/2 and nsquared = guess’ 

2. If nsquared > number, set high = guess to eliminate the top half of the range, 
otherwise set low = guess to eliminate the bottom half of the range 

3. Set guess = (low + high) /2 andnsquared = guess* 

4. Repeat steps 2 and 3 until nsquared = number 


We can draw a hierarchy chart to represent these steps: 
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Find 
square root 


Initialise Compare guess Output 


variables with target square root 
(BLOCK 1) (BLOCK 2) (BLOCK 3) 


Initialise number nis. dat c Reset range, 
low, high, middle alculate guess ompare guess recalculate 


middle 


The chart represents the blocks of program code that we will use to solve the problem. The solution is 
short, so it’s not necessary to put each block in a separate subroutine. 


number L321. 

low 

high number 

guess = int((low+high) /2) 

nsquared = guess**2 BLOCK 1 SEQUENCE 


while nsquared != number 
if nsquared > number 
high = guess 
else 
low = guess 
endif 
guess = int((low+hich) /2) 
nsquared = guess**2 BLOCK 2 - ITERATION 
endwhile 


print ("square root is ",quess) BLOCK 3 - SEQUENCE 


Interpreting algorithms 


A useful skill is to be able to look at someone else’s algorithm and decide what it does and how it works. 
Of course, if the programmer has put in lots of useful comments, used meaningful variable names and 
split a complicated algorithm into separate modules, that should not be too difficult! 
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Strategy for interpreting algorithms 


Here are some tips, which may seem fairly obvious. 


y 
2 
3. 
4 


. Trya “dry run” with some test data 


Read the comments in the program 
Look at the variable names to see if they give any clues 


Follow the steps in the program 


Drawing a trace table 


A trace table is a useful tool for performing a dry run through a program. As you follow through the logic 
of the program in the same sequence as the computer does, you note down in the trace table when each 
variable changes and what its value is. 


Q5: Dry run the algorithm below by completing the table. 


Assume that x has a value of 7. The MOD operator calculates the remainder resulting from an 
integer division. 
answer = True 
For Sount = 2 to (x1) 
remainder = x MOD count 
if remainder = 0 then 
answer = False 
endif 
next count 


What is the purpose of this algorithm? 
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Exercises 
1. Ina football leaque, the results of each match are input to the computer, which updates each team’s 
points. 


In the case of a draw, each team (Team A and Team B) gets one point. 
If Team A wins, then Team A gets 3 points and Team B gets no points. 
The algorithm for updating points in the case of a draw is: 


if TeamAGoals == TeamBGoals then 
TeamAPoints = TeamAPoints + l 
TeamBPoints = TeamBPoints + l 
endif 


Write an algorithm for updating the points if there is a winner. [3] 
2. Expert jugglers learn new juggling patterns according to certain rules represented by numbers. In this 
example, the rules for patterns of three numbers are: 
Rule 1: the total value of the numbers in the list must be a multiple of 3 


Rule 2: No number must be one less than the previous number, even if the pattern is 
repeated indefinitely. 


Here are some valid patterns of three numbers: 


744 


441 

Here are some examples of invalid patterns with three numbers: 
421 (4+2+41=/7, which is not a multiple of 3, so does not obey rule 1) 
651 (5 is one less than the previous number, so this does not obey rule 2) 


62/7 (when this is repeated,6 27627627... 6is one less than the previous number, so this 
does not obey rule 2) 


(a) State why the following lists of 3 numbers are not valid patterns of numbers. 
(i) 516 [1] 
(i) 442 [1] 
(o) Write pseudocode for a program which: 


« Prompts the user to enter 3 numbers, one after the other 


« Outputs “INVALID PATTERN’ if the sequence of numbers does not obey the two rules. [7] 


61 


SECTION 3 — SOFTWARE DEVELOPMENT 


3. José works for a company that provides loans to its customers. When customers take out a 
loan they decide how much money to borrow and for how many years. 


The interest rate is currently 10% but it may change in the future. 
José writes the following program to calculate the monthly payment for a loan. 


Ol program loanCalculator 


02 

03 CONST INTEREST RATE = 10 

04 

05 begin 

06 amount = input ("Enter amount: ") 

07 years = input("Enter years: ") 

08 annualInterest = amount * interestRate / 100 
09 totalToPay = (annualInterest * years) + amount 


10 monthlyPayment = totalToPay / (years * 12) 
La print ("Monthly Payment:", monthlyPayment) 
12 end 


(a) Using the code above, show the value that will be output if the inputs are: 


Amount: 600 
Years: 5 


You must show all your working. 


(b) Parentheses have been used in lines 09 and 10. 


(i) State why the parentheses in line 09 are not essential. 
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(ii) Explain why the parentheses in line 09 are useful. 
(ili) Explain why the parentheses in line 10 are essential. 
The algorithm uses a constant. 


Identify the constant, and explain why a constant has been used. 
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(dq) The company also offers a savings plan. Customers pay a fixed amount each year into 
the savings plan. At the end of each year, the company adds the value of the savings plan 
at the start of the year to the amount paid, and then adds interest of 10% to obtain the final 
value for the year. 


For example, if a customer saves £100 each year, the value of the savings plan for 5 years is 
shown in the table below 


Year Start Paid in Interest Final 
1 0.00 100.00 10.00 110.00 
2 110.00 100.00 21.00 231.00 
3 2o4« 00 100.00 aa.10 364.10 
‘ 364.10 100.00 46.41 LAF we Sh 
a SLU sou 100.00 1.05 671,56 


Write an algorithm which allows the user to input the amount saved each year and the number 
of years, and outputs the growth of the savings plan in the format shown above. [7] 


OCR F452/01 Qu 2 2014 
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Chapter 13 — Programming paradigms 


Objectives 


A) e Understand the need for and characteristics of a variety of programming paradigms 
¢ Describe the features of procedural languages 


(A) e Describe the features of object-oriented languages 


A-Level only 


Programming paradigms 


A programming paradigm is a style of computer programming. Different programming languages support 
tackling problems in different ways, and there are four major programming paradigms, each supported by 
a number of different languages. Some languages such as Python, Delphi and Java support more than 
one programming paradigm. 


¢ Procedural programming is supported by languages such as Python or Pascal, which have a 
series of instructions that tell the computer what to do with the input in order to solve the problem. 
They are widely used in educational environments, being relatively easy to learn and applicable to 
a wide variety of problems. Structured programming is a type of procedural programming which 
uses the programming constructs of sequence, selection, iteration and recursion. It uses modular 
techniques to split large programs into manageable chunks. 


¢ Object-oriented programming is supported by languages such as Java, Python and Delphi. 
OOP was developed to make it possible to abstract details of implementation away from the 
programmer, make code reusable and programs easy to maintain. It is to a great extent taking over 
from procedural programming. 


¢ Declarative programming is supported by languages such as SQL, where you write statements 
that describe the problem to be solved, and the language implementation decides the best way 
of solving it. SQL (covered in more detail in Chapter 18) is used to query databases. 


¢ Functional programming is supported by languages such as Haskell, as well as languages such as 
Python, C# and Java. Functions, not objects or procedures, are used as the fundamental building 
blocks of a program. Statements are written as a series of functions which accept input data as 
arguments and return an output. Functional programming is not covered in this course. 


Different types of problem require different types of language, and hundreds of different languages have 
been developed for different tyoes of application. Assembly language was the first language to 

be developed after machine code, and the next step was the development of procedural languages 

in the 1960s. 


Procedural languages 


A procedural language has built-in data types such as integer, real or floating point numbers, 
character, Boolean and string. In addition, it typically has data structures such as array and record. 
Programmers can define their own abstract data types such as queue, stack, tree, or hash table all 

of which you will study during this course. 


Consider an abstract data structure such as a stack. This can be visualised like a stack of plates. 
You can only add an item to the top of a stack, and you can only remove an item from the top. 
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The programmer might decide there is a limit to how large the stack is allowed to get, so you can't add 
to a full stack, and obviously, you can’t remove an item from an empty stack. 


This abstract data structure can be implemented in different ways. In Python, it could be implemented 
with the built-in list data structure. In Pascal, you could use an array, with a pointer to the top of the 
stack. The important thing is, that someone using this data structure should not need to know how it is 
implemented, any more than they need to know how a square root is worked out when they press the »/ 
(square root) button on a calculator. All the user needs to know is the state and behaviour of the 

data structure. 


Clearly, it is a waste of time for every programmer who needs to use a stack to have to decide how to 
implement it and write their own subroutines to add and remove items from it. This is where an object- 


oriented approach comes in. 
A-Level only 
Object-oriented languages 
In an object-oriented language, we define a class as the description of what the data looks like (the 


state) and what the data can do (the behaviour). The user of a class sees only the state and behaviour of 
a data item. Data items are called objects, where an object is an instance of a class. 


Programming in an object-oriented language requires thinking in terms of the objects that will carry out 
the required tasks, rather than thinking about data structures and algorithms. We are familiar with the 
concept of objects in the physical world — they could be cats, dogs, plates, cars, patients, doctors, 
students, and so on. 


In a hospital system, objects might be patient, ward, doctor, nurse and so on. Each of these objects can 
be defined as a class, with its own set of behaviours. Each individual ward will be a single instance of 
the class called ward, The class will have attributes such as name, number of _ beds, number _of_ 
patients, location, type. A particular instance of the ward may have name Bramford, number_ 
of beds 6, location Block E, number of patients 35, type Children’s. Its behaviours might 
include admit patient, discharge patient. 


Inheritance 


Below is a simple example of a class and its subclasses. Suppose an object-oriented program used 
by an estate agent defines a class called Property. Property has attributes including address, 
owner, type, number of bedrooms, price. 


The class Property has two subclasses called Property For Rent and Property For Sale. 
The subclasses have the same attributes as the superclass Property, and in addition, each has some 
attributes of its own. The subclasses are said to inherit properties from the superclass, and we can draw 
an inheritance diagram. 


Property_For_Rent Property_For_Sale 


Class diagram 


In the class diagram, inheritance is shown using an open-headed arrow. 
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A-Level only 


Example 
A class called DataStructure is created in an object-oriented language. The DataStructure class 
has two subclasses called Stack and Queue. Both Stack and Queue inherit attributes name, size, 
isEmpty and isFull from the superclass. They also inherit methods called addItem, removeItem. 


DataStructure 


Classes are defined differently in different programming languages, but in the pseudocode that is used on 
this course, the superclass might be defined like this: 


class DataStructure 
private size 
private isFull 
private isEmpty 
public procedure new(structureSize) 
size = structureSize 
endprocedure 
public procedure addItem(parameter) 
(instructions to add an item to end of data structure) 
endprocedure 
public function removeItem 
(instructions to remove first item from data structure) 
endfunction 
endclass 


In the definition, attributes are generally described as private. This means that users cannot directly 
access them. They are changed through statements within the various methods. Methods fall into one 
of two categories — functions, which return a value, and procedures, which do not. For example, when 
an item is to be added to a data structure, the item which is to be added is passed as a parameter, 
but nothing is returned. If an item is to be removed, no parameter is needed, and the item removed is 
returned from the function. 


The class Stack could be defined as follows: 


class Stack inherits DataStructure 
public function removeItem 
(instructions to remove item from end of stack) 
endfunction 


The attributes size, isFull, isEmpty and the methods new, addItem are inherited from the 
DataStructure Class. 
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Polymorphism 


Polymorphism refers to a programming language's ability to process objects differently depending on 
their class. 


A class of objects has behaviours or methods, all of which will be inherited by its subclasses, 


In this example the class Stack defines its own method removeltem. This is because, although 

there is a method of the same name in the superclass which it could inherit, it will process a Stack 
object differently. In a stack, the /ast item in the stack will be removed. In the superclass, assume that 
the method removeItem removes the first item in the data structure. In the case of a queue, this is 
fine, but it is not what is required for the stack. However, both Stack and Queue objects carry out the 
method adadi/tem in an identical way, adding the item to the end of the data structure. Hence, the method 
addItem does not need to be redefined in the Stack class definition. 


This is what is meant by polymorphism; the subclass Stack redefines the method removeItem 
defined in the superclass DataStructure to process objects in the class differently. 


The attributes size, isFull, isEmpty are all defined in the superclass DataStructure. 
These attributes cannot be accessed directly if they are declared private; they can only be accessed 
through the class methods. This is known as encapsulation. 


Constructors and inheritance 


Inheritance is denoted by the inherits keyword, and superclass methods are defined with the 
keyword super. @.g. super.new(stackSize). 


A procedure with the name new is a constructor. To create a new object called myStack of size 20 
which belongs to class Stack, the following statement would be written: 


myStack = new Stack (20) 


More detail on object-oriented programming is given in Section 11, Chapter 58. 


Advantages of the object-oriented paradigm 


Building code into objects has a number of advantages, including: 


e The object-oriented methodology forces designers to go through an extensive planning phase, which 
makes for better designs with fewer weaknesses 


e Encapsulation: the source code for an object can be written, tested and maintained independently of 
the code for other objects 


*« Once an object is created, knowledge of how its methods are implemented is not necessary in order 
for a programmer to use it 


¢ New objects can easily be created with small differences to existing ones 


*« Reusability: objects that are already defined, coded and tested may be used in many 
different programs 


e OOP provides a good framework for code libraries with a range of software components that can 
easily be adapted by a programmer 


* Software maintenance: an object-oriented program is much easier to maintain than one written ina 
procedural language because of its modular structure 
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: Exercises 


1. A programming paradigm is a style of computer programming. Procedural programming, supported 
by languages such as Python or Pascal, which have a series of instructions that tell the computer 
what to do with the input in order to solve the problem, is one example of a paradigm. 


Name and briefly describe two other programming paradigms, giving an example of an 
application of each and a language which supports it. [6] 


2. (a) Explain what is meant by the term class in object-oriented programming. [2] 


(b) An institution categorises its staff as either Academic or Administration. Administration 
staff may be either Salaried or HourlyPaid. 


Five classes are to be created in an object-oriented programming language. 
(i) Draw a class diagram for the five classes. [3] 
(ii) Describe what is meant by polymorphism. [1] 


(iv) Explain how this might apply to a method called CalculatePay in the 
class Administration, [2] 


3. The system used by a garden centre to store and retrieve details of its products is written in an 
object-oriented language. Part of the design is shown on the class diagram. 


ProductCode 
Name 
Price 


setPrice( ) 
findPrice{ ) 


FlowerColour 
Variety 


findVariety( ) 


Explain the terms class, derived class, inheritance and encapsulation, using examples from the 
garden centre. 


The quality of written communication will be assessed in your answer to this question. [8] 


vA) OCR F453/01 Qu 6 June 2013 
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Chapter 14 — Assembly language 


Objectives 


* Be able to write and follow simple assembly language programs 


TA) « Understand and apply immediate, direct, indirect and indexed addressing modes 


Assembly language instructions 


Machine code was the first “language” used to enter programs by early computer programmers. The next 
advance in programming was to use mnemonics instead of binary codes, and this was called assembly 
code or assembly language. Each assembly language instruction translates into one machine 

code instruction, 


Assembly code uses mnemonics to represent the operation codes and addresses. Typically, 2-, 3- or 
4-character mnemonics are used to represent all the machine code instructions. The assembler then 
translates the assembly language program into machine code for execution. 


The following table shows mnemonics for instructions in the instruction set of the Little Man Computer, 
which is an imaginary computer designed to enable you to easily enter and test assembly 
language programs. 


Add the contents of the memory address to the Accumulator 


SUB SUBTRACT — Subtract the contents of the memory address from the 
Accumulator 
STORE cp the value in the Accumulator in the memory address 


— LOAD _ Load the Accumulator with the contents of the memory 

address given 
_ BRANCH Branch - use the address given as the address of the next 

(unconditional) 


BRANCH 
IF ZERO 
(conditional) 


BRANCH IF —— ; 
POSITIVE Branch to the address given if the Accumulator is zero or 


instruction 


Branch to the address given if the Accumulator is zero 


(conditional) positive 
Input into the accumulator 


Output contents of accumulator 
Stops the execution of the program 


Used to indicate a location that contains data. 


Table 14.7 
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You can experiment with the LMC at http://peterhigginson.co.uk/LMC/ 


re = ll hans . 
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L oo ff 000 f 000 00 000 000 
J [oo 000 f 000 F050 200 000 000 


(a 


6:6: 6:8:8:6 


Figure 14.1: The LMC computer simulator 
All the examples below use instructions from Table 14.1. 


Data transfer and arithmetic operations 


Example 1 
Input three numbers x, y and z. Calculate and output the value of x + y - z 


INP ;Input y into accumulator ACC) 
STA y ;Store the number in y 
INP Input x into ACC 
STA z ;Store the number in z 
INP Input x into Ac 
ADD y jAdd y to the number in ACC 
SUB z ?subtract z from the number in ACC 
OUT joutput the number in Acc 
ILT halt 
x DAT 
DAT 
zZ DAT 


Branch instructions 


The flow of the program can be altered using a conditional or unconditional branch instruction. 
The conditional branch instructions BRP (Branch if positive), BR2 (Branch if zero) cause a branch to 
a given label in the program depending on the value held in the accumulator. 


An unconditional branch instruction, (BRA) will cause a branch whatever the value held in 
the accumulator. 
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Example 2 


Compare the two numbers held in memory locations num1 and num2, and output the larger. If they are 
equal, either one can be output. 


LDA numl 

SUB num2 

BRP firstmax 

LDA num2 

OUT num2 

HALT 
firstmax LDA numl 

OUT numl 

HLT 


Example 3 


Write an assembly code program which performs integer division. The program inputs two numbers big 
and small, and outputs the result of big divided by smal11, ignoring the remainder. 


There is no division instruction in this instruction set, so we have to repeatedly subtract small from 
big, adding 1 to a variable which we will call answer, until big becomes less than zero. Each time we 
subtract, we add 1 to a variable called answer. 


INP ; Input the number 1 
STA one ; store in one 
INP ; Input the number 0 
STA answer ; Store in answer 
INP ; Input the divisor 
STA small ; Store in small 
INP ; Input the number to be divided 
STA big ; store value in big 
next SUB small ; subtract small from ACC which contains big 
STA big ; Store in big 
BRP more ; Branch if ACC positive or zero to more 
LDA answer 
OUT ; Output the answer in ACC 
HLT ; Halt 
more LDA answer ; Load answer into ACC 
ADD one ; Increment ACC 
STA answer ; Store in answer 
LDA big ; load what is left of big 
BRA next ; Branch to next 
x DAT 
one DAT 
big DAT 


small DAT 
answer DAT 
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Format of machine code instructions 


In Chapter 2, the basic structure of a machine code instruction in a 16-bit word was described as having 
a format similar to that shown below. 


Operation code Operand(s) 


‘ Addressing 
Basic machine operation g 
mode 


The LMC instruction set has only 11 instructions, and the imaginary machine has only 100 memory 
locations. The maximum data value is 999, which can be held in 10 bits. Four bits would be enough to 
store the operation code, and 7 bits would be enough to store the operand. A word size of 16 bits would 
be plenty big enough to hold an instruction or a data value. 


In a real computer, there will be considerably more than 11 instructions in the instruction set. It will 
include, for example, multiply and divide in the arithmetic instructions, and shift instructions to shift bits 
left or right. 


There will also normally be up to 16 registers in which calculations can be carried out, rather than a 
single accumulator. 


A-Level only 


' Addressing modes 
The operation code (opcode) consists of binary digits representing the basic operation such as ADD or 
LOAD, and a 2-digit code representing the addressing mode. 


There are four different addressing modes, which are indicated by the bit pattern that is the last two bits 
of the opcode: 


using immediate addressing, the operand is the actual value to be operated on, say 3 or 75 


using direct addressing, the operand holds the memory address of the value to be operated on. This 
is the only addressing mode used in the LMC assembly language 


using indirect addressing, the operand is the location (typically a register) which holds the address of 
the data we want. This enable a larger range of addressable locations. 


using indexed addressing, the address of the operand is obtained by adding to the contents of a 
general register (called the index register) a constant value. The number of the index register and the 
constant value are included in the instruction code. Indexed addressing mode is used to access an array 
whose elements are in Successive memory locations. 
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Examples of the use of each of these are given below. 


Suppose contents of accumulator, index register and a section of memory are as follows: 


Accumulator Acc holds 25 
Index register holds 6 
Register 0 (RO) contains 0 


load immediate 4 will out the value 4 into ACC 


load direct 4 will put 8 (the contents of location 4) into ACC 
load indirect 4 will put 27 (the contents of the address held in location 4) into ACC 
load indexed R0 will put 15 (contents of location (6 + contents of RO)) into ACC 


If RO is now incremented to contain 1, 


load indexed RO will put 32 (contents of location (6+1)) into ACC 


By incrementing the value in RO, successive memory locations can be accessed. 


Exercises 


le 


3. 


(a) In a particular machine code, the opcode is stored in 6 bits and the operand is stored in 
12 bits. What is the maximum number of operations in the machine's instruction set? 


(bo) Explain, with the aid of examples, the difference between immediate, direct and 
indirect addressing. 


Using instructions ADD x (Add number stored in x to the accumulator) 
LDA x (Load into the accumulator the value stored in x 
STA x (Store the value in the accumulator in location x) 


write an assembly language program that adds together the values stored in memory locations 
numl and num2, storing the resulting total in memory location num3. 


Write an assembly language program which counts and outputs the number of values entered 
by the user, and the total of the values input. End of input is signalled by dummy value 0. You 
may assume that memory locations called increment, total and numvals contain 

1, 0 and O respectively. (Use LMC assembly language instructions.) 


["] 
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Section 4 


Exchanging data 


In this section: 


Chapter 15 
Chapter 16 
Chapter 17 
Chapter 18 
Chapter 19 


Chapter 20 


Compression, encryption and hashing 
Database concepts 

Relational databases and normalisation 
Introduction to SQL 

Defining and updating tables using SQL 


Transaction processing 
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Chapter 15 — Compression, 
encryption and hashing 


Objectives 

* Know why sound and images are often compressed 

e Understand how other files can be compressed 

e« Understand the difference between lossless and lossy compression 
e« Explain the advantages and disadvantages of different compression techniques 
Explain run length encoding and dictionary based compression 


* Define symmetric and asymmetric encryption 


2000 


e Understand how and why hashing may be used to encrypt data 


Why use compression? 


File compression techniques were developed to reduce the storage space of files on disk. With disk 
storage becoming larger and cheaper, this is less important these days, but the reduction of file size has 
become even more important in the sharing and transmission of data. Internet Service Providers (ISPs) 
and mobile phone networks impose limits and charges on bandwidth. Images on websites need to be in 
a compressed format to enable a web page to load quickly - even on a fast connection, music and video 
streaming must take advantage of compression in order to reduce buffering. (In streaming audio or video 4-15 
from the Internet, buffering refers to downloading a certain amount of data to a temporary storage area or 
buffer, before starting to play a section of the music or movie.) 


Compression can be either lossy, where unnecessary information is removed from the original file, or 
lossless. Lossless compression retains all information required to replicate the original file exactly. 


Lossy compression 


Lossy compression works by removing non-essential information. The two JPG images below are clearly 
identifiable as the same thing, but one has been heavily compressed, displaying untidy and blocky 
compression artefacts as a consequence. Nevertheless, we can make out the subject of the image well, 
but the degree to which they are compressed comes at the cost of quality. 


Original image 310KB Heavily compressed image 5.7KB 


15 


SECTION 4 — EXCHANGING DATA 


The compression of sound and video works in a similar way. MP3 files use lossy compression to remove 
frequencies too high for most of us to hear and to remove quieter sounds that are played at the same 
time as louder sounds. The resulting file is about 10% of original size, meaning that 1 minute of MP3 
audio equates to roughly 1MB in size. 


Voice is transmitted over the Internet or mobile telephone networks using lossy compression and 
although we have no problem in understanding what the other person is saying, we can recognise the 
difference in quality of a voice over a phone rather than in person. The apparent difference is lost data. 


Lossless compression 


Lossless compression works by recording patterns in data rather than the actual data. Using these 
patterns and a set of instructions on how to use them, the computer can reverse the procedure and 
reassemble an image, sound or text file with exact accuracy and no data is lost. This is most important 
with the compression of program files, for example, where a single lost character would result in an error 
in the program code. A pixel with a slightly different colour would not be of huge consequence in most 
cases. Lossless compression usually results in a much larger file than a lossy file, but one that is still 
significantly smaller than the original. 


A-Level only 


; Run Length Encoding (RLE) 


If you were ordering food from a takeaway restaurant for a group of five friends, it is likely that you 
might ask for “5 pizzas” rather than “one pizza, and another pizza, and another pizza etc.” Run Length 
Encoding exploits the same principle. Rather than recording every pixel in a sequence, it records its 
value and the number of times it repeats. 


For this section of the balloon image, the encoding for the first row might crudely translate to: 

6 green, 8 yellow and 17 orange, using one binary value for the colour value and another for the number 
of contiguous matching pixels in the run. This would reduce the data necessary to store this row to 6 
bytes (00000110 00000001 000071000 00000010 00010001 0000001 1) rather than 31 bytes assuming a 
bit depth of 8 and values for each colour of OO000001, 00000010 and 00000011. 


____ _ 
6m 8m 17m 
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Dictionary-based compression techniques 


Suppose that instead of sending a complete message, a copy of the Oxford English dictionary was sent 
alongside a coded message using the page number and the position of the word on that page. The word 
‘pelican’ falls on page 249 as the 7th word on that page. This could be send as 249,7 — using only 2 
bytes; considerably fewer than the 7 bytes it would take to send the complete word. (Ignore, for now, the 
additional space that it would take to send the dictionary with it!) 


Dictionary based compression works in a similar way. The compression algorithm searches through the 
text to find suitable entries in its own dictionary (or it may use a known dictionary) and translates the 
message accordingly. 


[es | we pn 


Using the dictionary table above, the saying “Do unto others as you would have others do unto 
you” would be compressed as 12345673825 or in binary using only 33 bits. This compares 
to 51 characters or 51 bytes — a reduction of 92%. This still ignores the fact that the dictionary must 
also be stored with the text, but with a longer body of text to be compressed, a dictionary becomes 
quite insignificant in size compared with the original, and the original message can still be 
reassembled perfectly. 


Encryption 


Encryption is the transformation of data from one form to another to prevent an unauthorised third party 
from being able to understand it. The original data or message is known as plaintext. The encrypted 
data is known as ciphertext. The encryption method or algorithm is Known as the cipher, and the 
secret information to lock or unlock the message is Known as a key. 


The Caesar cipher and the Vernam cipher offer polar opposite examples of security. Where the Vernam 
offers perfect security, the Caesar cipher is very easy to break with little or no computational power. 
There are many other methods of encryption - some of which may take many computers many years 
to break, but almost all of these are still breakable and the principles behind them are similar. 


The Caesar cipher 


Julius Caesar is said to have used this method to keep messages secure. The Caesar cipher (also 
known as a shift cipher) is a type of substitution cipher and works by shifting the letters of the 
alohabet along by a given number of characters; this parameter being the key. Below is an example of 
a shift cipher using a key of 5. 


fals|ciolelFi[elui{rjaixl{[uimiwfol>jolals|riulviw]x]y]z | 
1 ot L | + 4 
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You will no doubt be able to see the ease with which you can decrypt a message using this system. 


DGYDQOFH WR ERUGHU DQG DWWDFN DW GDZQ 


Even if you had to attempt a brute force attack on the message above, there are only 26 different 
possibilities. Otherwise you might begin by guessing the likelihood of certain characters first and go from 
there. Using cryptanalysis on longer messages, you would quickly find the most common ciphertext letter 
and could start by assuming this was an E, for example; or perhaps an A. (Hint.) 


| The Vernam cipher 


The Vernam cipher, invented in 1917 by the American scientist Gilbert Vernam, is the only cipher 
still proven to be unbreakable. All others are based on computational security and are theoretically 
discoverable given enough time, ciphertext and computational power. 


One-time pad 
The encryption key or one-time pad must be equal to or longer in characters than the plaintext, be truly 
random and be used only once. One-time pads are used in pairs where the sender and recipient are both 
party to the key. Both must meet in person to securely share the key and destroy it after encryption or 
decryption. Since the key is random, so will be the distribution of the characters meaning that no amount 
of cryptanalysis will produce any meaningful results. 


: The bitwise exclusive or XOR 


An XOR operation is carried out between the binary character value of the first character of the plaintext 
and the first character of the one-time pad. Use the ASCII chart on page 160 for reference. 


| YX (OF i 


Using this method, the message “Meet on the bridge at 0300 hours’ encrypted using a one-time pad 
of +tkiGeMxGvnhoQ0xQDIllVdT4slJm9qf will produce the ciphertext: 


Fdfig#x3SH#Y6!i(=vTg?Ci"7L UC 
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The encryption process will often produce strange symbols or unprintable ASCII characters as in the 
above example, but in practice it is not necessary to translate the encrypted code back into character 
form, as it is transmitted in binary. To decrypt the message, the XOR operation is carried out on the 
ciphertext using the same one-time pad, which restores it to plaintext. 


Cryptanalysis and perfect security 


Other ciphers that use non-random keys are open to a cryptanalytic attack and can be solved given 
enough time and resources. Even ciphers that use a computer generated random key can be broken 
since mathematically generated random numbers are not actually random; they just appear to be 
random. A truly random sequence must be collected from a physical and unpredictable phenomenon 
such as white noise, the timing of a hard disk read/write head or radioactive decay. Only a truly random 
key can be used with a Vernam cipher to ensure it is mathematically impossible to break. 


Symmetric (private key) encryption 


Symmetric encryption, also known as private key encryption, uses the same key to encrypt and 
decrypt data. This means that the key must also be transferred (known as key exchange) to the same 
destination as the ciphertext, which causes obvious security problems. The key can be intercepted as 
easily as the ciphertext message to decrypt the data. For this reason asymmetric encryption can be 
used instead. 


Asymmetric (public key) encryption 


Asymmetric encryption uses two separate, but related keys. One key, known as the public key, is 
made public so that others wishing to send you data can use this to encrypt the data. This public key 
cannot decrypt data. Another private key is known only by you and only this can be used to decrypt the 
data. It is virtually impossible to deduce the private key from the public key. It is possible that a message 
could be encrypted using your own public key and sent to you by a malicious third party impersonating a 
trusted individual. To prevent this, a message can be digitally ‘signed’ to authenticate the sender. 


Recipient's public key made available to others 
wanting to send recipient data securely 


eae eeer tere enn, 
Ce se Pe ee Oe 
=? . 
ee ae BS ee 


Encrypted message 
eeeeeneneeenneeees > a 
| 


Recipient's public key used to : Data encrypted with user's public key can only 
encrypt data before sending + be decrypted with the user's private key 


Data can be intercepted but cannot be 
deciphered without the private key 
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- Hashing 


A hashing function provides a mapping between an arbitrary length input and a usually fixed length or 
smaller output. Unlike the encryption techniques described above, it is one-way; you cannot get back 
to the original. This is useful for storing encrypted PINs and passwords so that they cannot be read 
by a hacker. To verify a user’s password, the software applies the hash function to the user input and 
compares it with the one stored. 


Methods of hashing are discussed in Section 7, Chapter 37. 


: Cryptographic hash functions 


A hash total is a mathematical value calculated from unencrypted message data. This value is also 
referred to as a checksum or digest. The process is irreversible and impossible to crack other than by 
trying all of the possible inputs until a match is found. Since the hash total is generated from the entire 
message, even the slightest change in the message will produce a different total. 


Input text: | 24ad3 44004 d977b 
“Apples and pears” Hash function SHA-1 fe £5dB6 6a03d FE8b2 


Input text: 06909 926bd 2152c 03149 


ee3fl bebcS lbecd 


“Apples and bears” Hash function SHA-1 


7 Digital signatures 


A digital signature or hash value is the equivalent of a handwritten signature or security stamp, but 
offers even greater security. The sender of the message uses their own private key to encrypt the hash 
total. The encrypted total becomes the digital signature since only the holder of the private key could 
have encrypted it. The signature is attached to the message to be sent and the whole message including 
the digital signature is encrypted using the recipient's public key before being sent. The recipient 
decrypts the message using their private key, and decrypts the digital signature using the sender's public 
key. The hash total is then reproduced based on the message data and if this matches the total in the 
digital signature, it is certain that the message genuinely came from the sender and that no parts of the 
message were changed during transmission. To ensure that the message could not be copied and resent 
at a later date, the time and date can be included in the original message, which if altered, would cause a 
different hash total to be generated. 


Digital signatures can be used with any kind of message regardless of whether encryption has also 
been used. They can be used with most email clients or browsers making it easy to sign outgoing 
communications and validate signed incoming messages. If set up to use digital signatures, your 
browser should warn you if you download something that does not have a digital signature. This would 
also mean that anything sent by you, including online commercial and banking transactions, can be 
verified as your own. 


Hoax digital signatures could be created using a bogus private key claiming to be that of a trusted 
individual. In order to mitigate against this, a digital certificate verifies that a sender's public key is 
formally registered to that particular sender. 


80 


CHAPTER 15 — COMPRESSION, ENCRYPTION AND HASHING 


Lael one a 
Digital certificates 


While digital signatures verify the trustworthiness of message content, a digital certificate is issued by 
official Certificate Authorities (CAs) such as Symantec or Verisign and verifies the trustworthiness of 
a message sender or website. This certificate allows the holder to use the Public Key Infrastructure 
or PKI. The certificate contains the certificate’s serial number, the expiry date, the name of the holder, 

a copy of their public key, and the digital signature of the CA so that the recipient can authenticate the 
certificate as real. Digital certificates operate within the Transport layer of the TCP/IP protocol stack. 


TCPAIP is covered in Chapter 22. a 
Exercises 
1. (a) Explain why compression is considered necessary for images on the web. [2] 
(b) Explain why lossy compression techniques would not be suitable for use with files 
containing large bodies of text. [1] 
(c) Suggest a suitable lossless method for compressing text. [1] 
2. (a) Explain the difference between lossy and lossless data compression. [2] : 


(b) Run-length encoding (RLE) is a pattern substitution compression algorithm. Data is stored 
in the format (colour,run), where O = White and 1 = Black. 


a. (0,1), (1,5), (9,1), 

b. (1,7), 

c. (1,1), (0,2), (1,1), (0,2), (1,1), 

d. (1,7), 

@. (0,1), %1,1', (0,1), [,1), (0,1), {1.1}. 43,1), 
£. (0,1), (1-1), (9-1), (1,1), (0,1), (1,1), (0,1), 
g. (0,1), (1,1), (0,3), (1,1), (0,1) 


Reassemble the encoded sequence above to form a 7x7 web icon image in the grid below. [3] 


(c) RLE encoding is a lossless compression method. Give one disadvantage of lossless 
compression over lossy methods for the compression of images. [1] 


3. (a) State what is meant by symmetric encryption and explain with the aid of an example how 


it can be implemented. [4] 
(b) {i} Explain what is meant by asymmetric encryption. [4] 
(ii) Explain why this form is more secure than symmetric encryption. [2] oO 
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Chapter 16 — Database concepts 


Objectives 
e Explain the concept of a relational database 


¢ Define the terms: flat file, entity, attrinute, primary key, foreign key, secondary key, entity relationship 
modelling, referential integrity 


¢ Produce an entity relationship model for a simple scenario involving multiple entities 


Modelling data requirements 


When a systems designer begins work on a new proposed computer system, one of the first things they 
need to do is to examine the data that needs to be input, processed and stored and determine what the 
data entities are. 


Definition: An entity is a category of object, person, event or thing of interest to an organisation about 
which data is to be recorded. 


Examples of entities are: Employee, Film, Actor, Product, Recipe, Ingredient. Each entity in a database 
system has attributes. 


A flat file database 


A flat file database consists of a single file. It might be a suitable structure to hold the names and 
addresses of all members of a sports club, or information about all the DVDs in your personal collection. 


Most databases, however, are concerned with more than one entity, and the relationships between the 
entities. In a collection of DVDs, you might want to Keep a record of which main actors starred in each 
film. Actor would be a second enitity in its own right. 


Example 1 


A dentist's surgery employs several dentists, and an appointments system is required to allow patients to 
make appointments with a particular dentist. 


Entities in this system include Dentist, Patient and Appointment. The attributes of Dentist may include 
Title, Firstname, Surname, Qualification. 


Attributes of Patient may include Title, Firstname, Surname, Address, Telephone. 


Entity descriptions 
An entity description is normally written using the format 
Entity1 (Attribute1, Attribute2...) 
The entity description for Dentist is therefore written 


Dentist (Title, Firstname, Surname, Qualification) 
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Entity identifier and primary key 
Each entity needs to have an identifier which uniquely identifies the entity. In a relational database, the 
entity identifier is Known as the primary key and it will be referred to as such in this section. Clearly none 
of the attributes so far identified for Dentist and Patient is suitable as a primary key. A numeric or string 
ID such as 013649 could be used. In the entity description, the primary key is underlined. 


Dentist (DentistID, Title, Firstname, Surname, Qualification) 


Secondary key 


A database needs to be set up so that it can be searched quickly. An index of all the primary keys in 
the database, and where the record is held, is automatically maintained by the database software. 
However, more than one index may be needed. 


lf for example a patient rings up to make an appointment with the dentist, they are unlikely to know their 
patient ID, A secondary index on surname is likely to be held. 

Relationships between entities 
The different entities in a system may be linked in some way, and the two entities are said to be related. 


There are only three different ‘degrees’ of relationship between two entities. A relationship may be 


* One-to-one Examples of such a relationship include the relationship between Husband and 
Wife, Country and Prime Minister. 


* QOne-to-many Examples include the relationship between Mother and Child, Customer and 
Order, Borrower and Library Book. 


e Many-to-many Examples include the relationship between Student and Course, Stock Item and 
Supplier, Film and Actor. 


Entity relationship modelling 


The relationship between entities can be modelled graphically. An entity relationship diagram is a 
diagrammatic way of representing the relationships between the entities in a database. To show the 
relationship between two entities, both the degree and the name of the relationship need to be specified. 
E.g. In the first relationship shown below, the degree is one-to-one, the name of the relationship is in 
charge of. 


in charge of 


Headteacher School One-to-one 


t 
Dentist st ~ Patient One-to-many 


orders 


Customer §& _ Product Many-to-many 
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The concept of a relational database 
In a relational database, a separate table is created for each entity identified in the system. Where a 
relationship exists between entities, an extra field called a foreign key links the two tables. 


Foreign key 


A foreign key is an attribute that creates a join between two tables. It is the attribute that is common to 
both tables, and the primary key in one table is the foreign key in the table to which it is linked. 


Example: In the one-to-many relationship between Dentist and Patient, the entity on the 'many’ side of 
the relationship will have DentistID as an extra attribute. This is the foreign key. 


Dentist Patient 


Dentist/D* Patient!D* 
Title Title 
FirstName FirstName 


Surname Surname 

Qualification Address 
Telephone 
Dentist/D 


Note that the primary key is indicated by an asterisk, and the foreign key is shown in italics. 


Linking tables in a many-to-many relationship 
When there is a many-to-many relationship between two entities, tables cannot be directly linked in 
this way. For example, consider the relationship between Student and Course. A student takes many 
courses, and the same course is taken by many students. 


Student ¥, — q Course Many-to-many 


In this case, an extra table is needed to link the Student and Course tables. We could call this 
StudentCourse, or Enrolment, for example. 


Student 4 Enrolment §& Course 


The three tables will now have attributes something like those shown below: 
Student (StudentlID, Name, Address) 
Enrolment (StudentiD, Course/D) 
Course (CourselD, Subject, Level) 
Composite key 
In this data model, the table linking Student and Course has two foreign keys, each linking to one of 


the two main tables. The two foreign keys also act as the primary key of this table. A primary key which 
consists of more than one attribute is called a composite primary key. 
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Drawing an entity relationship diagram 
A database system will frequently involve many different entities linked to each other, and an entity 
relationship diagram can be drawn to show all the relationships. 

Example 2 


A hospital inpatient system may involve entities Ward, Nurse, Patient and Consultant. A ward is 
staffed by many nurses, but each nurse works on only one ward. A patient is in a ward and has many 
nurses looking after them, as well as a consultant, wno sees many patients on different wards. 


Consultant 


Referential integrity 


When tables are linked in a relational database, it is important to ensure that, for example, a particular 
component is not deleted if it is used in a product in the Product table. This is known as referential 
integrity. 


School ~ we 
_ at ¥ 
| [School ID =v School ID “ = 
| Jon Type. 
= Ls 
A Exterce Referental Integrity Ceuhe. 
(Cascade Update Related Faeids 
() Cascade Delete Related Records 


Enforcing referential integrity, linked by School (D 


The screenshot above shows a relationship being created in MS Access between two tables linked 
by School ID. 
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Exercises 


le 


An estate agent keeps a database of all the properties it has for sale, the owners of the properties, 


and all the prospective buyers. 


Details about the properties for sale, including address, number of bedrooms, type of property, 
asking price are held in a table called Property. 


(a) Suggest a suitable primary key for the Property table. 


(b) Suggest two attributes in the Property table that may be defined as secondary keys, 
justifying why each should be defined in this way. 


Data on prospective buyers include name, telephone, address, type of property required, 
lower and upper limit for price. 


Data on vendors include name, address, telephone. 
A fourth entity, Viewing holds cata about all viewings. 
(c) Suggest three attributes for the entity Viewing. 


(d} Write entity descriptions for each of the entities Property, Vendor, Buyer and Viewing. 
In each case, identify any primary and foreign keys. 


(e) Draw an entity relationship diagram showing relationships between these four entities. 


A library plans to set up a database to keep track of its members, books and loans. Entities are 
defined as follows: 


Member (MemberlD, Surname, FirstName, Address) 
Book (BookID, ISBN, Title, Author) 
Loan (MemberlD, BookID, LoanDate, DueDate) 
When the book is returned the loan record is deleted. 
(a) Draw an entity relationship diagram showing the relationships between the entities. 


(b) A relational database is created with tables for each of these entities. The key in the Loan 
table is made up of two fields. 


What is the name given to a key that is made up of multiple attributes? 


(c) What is meant by a foreign key? Identify a foreign key in one of the tables. 
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3. An exam board wants to set up a database to hold data about its courses, exam papers, exam 
entries, candidates and results. For the purpose of this exercise, assume that each candidate 
can sit each exam once only. A course may have several exam papers (Comp 1, Comp 2, etc.). 


The data to be stored for the candidate are CandidateNumber, FirstName, Surname, DateOfBirth. 
The data to be stored for the course are CourselD, Subject, Level. 


The data held for each individual exam paper includes CourselD, ExamPaperlD, DateOfExam, 
Title, TotalMarks, ExamPaperWeighting. 


(a) State an identifier for the entity ExamPaper. [1] 
(0) Draw an entity-relationship diagram showing the relationships between the entities. [3] 


(c) Write an entity description for a Results entity which will store the exam mark that candidates 
receive for each exam paper. [2] 


4 (a) Discuss the suitability of flat files and relational databases for use by a family at home and 
for use in a large mail order company. 


The quality of written communication will be assessed in your answer to this question. [8] 
(6) In any relational database, primary and foreign keys are used. 
(i) What is a primary key? [7] 


(ii) Explain the use of a primary key as a foreign key. [3] 


OCR F453/01 Qu 9 June 20714 
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Chapter 17 — Relational databases and 
normalisation 


Objectives 


e Describe the use of secondary keys and indexing 
@ * Normalise relations to third normal form 


@* Understand why databases are normalised 


Relational database design 


In a relational database, data is held in tables (also called relations) and the tables are linked by means 
of common attributes. 


A relational database is a collection of tables in which relationships are modelled by shared attributes. 


Conceptually then, one row of a table holds one record. Each column in the table represents 
one attribute. 


e.g. A table holding data about an entity Book may have the following rows and columns: 
Book 


rt 
L 


‘BookID |DeweyCode |Title = ~—~—~—~*«| Author ~=—s| DattePublished | 
345.440 The Paying Guests 2014 


345.440 Fragile Lies Elliot, L 
200.00 Learn French with stories Bibard, F 2014 
a a 


To describe the table shown above, you would write 
Book (BookID, DeweyCode, Title, Author, DatePublished} 
Note that: 
The entity name is shown outside the brackets 
The attributes are listed inside the brackets 
The primary key is underlined 


The primary key is composed of one or more attributes that will uniquely identify a particular record in the 
table. (When describing an entity this is called an entity identifier.) 


Indexing 


In order that a record with a particular primary key can be quickly located in a database, an index of 
primary keys will be automatically maintained by the database software, giving the position of each 
record according to its primary key. 


One or more secondary indexes may be defined when the database is created, for any attribute that is 
often used as a search criterion. For example, in the above table both Author and Title might be defined 
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as secondary keys. This would speed up searches on either of these fields, which would otherwise have 
to be searched sequentially. 


Linking database tables 


Tables may be linked through the use of a common attribute. This attribute must be a primary key of one 
of the tables, and is Known as a foreign key in the second table. 


We saw in the last chapter that there are three possible types of relationship between entities: one-to- 
one, one-to-many and many-to-many. 


Normalisation 


Normalisation is a process used to come up with the best possible design for a relational database. 
Tables should be organised in such a way that: 


* no data is unnecessarily duplicated (i.e. the same data item held in more than one table) 


* data is consistent throughout the database (e.g. a customer is not recorded as having different 
addresses in different tables of the database). Consistency should be an automatic consequence 
of not holding any duplicated data. This means that anomalies will not arise when data is inserted, 
amended or deleted. 


e the structure of each table is flexible enough to allow you to enter as many or as few items (for 
example, components making up a product) as required 


e the structure should enable a user to make all kinds of complex queries relating data from 
different tables 


There are three basic stages of normalisation known as first, second and third normal form. 


First normal form 
A table is in first normal form (1NF) if it contains no repeating attribute or groups of attributes. 


Example 1 


A company manufacturing soft toys buys the component parts (fake fur, glass eyes, stuffing, growl etc.) 
from different suppliers. Each component may be used in the manufacture of several different toys (teddy 
bear, dog, duck etc.) Each component comes from a sole supplier. 


Sample data to be held in the database is shown in the table: 


1 | Stuffing 
123 Small monkey | 2.50 5,95 . Eye (small) 
Brown Fur 


1 | Stuffing 
156 Pink kitten 3.10 Eye (medium) 
Pink Fur 
Soundbox 


Table 17.1 
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A-Level only 


: As the first stage in normalization, we need to note that there are repeating groups of attributes in this 
table; for example, ProductID 123 has three components with IDs STO1, G56 and FF77. We need to split 
the data into two tables to get rid of the repeating groups. 


Note that a table in a relational database may be referred to as a relation. 


Two entities, Product and Component, can be identified. These have the following relationship: 


has 


Product > = Component 


These two entities could be represented in standard notation: 
Product (ProductlID, ProductName, CostPrice, SellingPrice) 
Component (CompiD, CompName, SupplierlD, SupplierName) 


We have not yet put CompQty (the amount or number of each component that is needed to make a 
particular product) in either table, but we will come to that. 


The two tables need to be linked by means of a common attribute, but the problem is that because this 
is a many-to-many relationship, whichever table we put the link attribute into, there needs to be more 
than one attribute. 


e.g. Product (ProductiD, ProductName, CostPrice, SellingPrice, CompQty, ComponentiD) 
is no good because each toy has several components, so which one would be mentioned? 


Similarly, Component (ComplD, CompName, SupplierlD, SupplierDetails, ProductID) 


is no good either because each component is used in a number of different products. 


One obvious solution (and unfortunately a bad one) springs to mind. How about allowing space for four 
components in the record for each product? 


Product (ProductiD, ProductName, CostPrice, SellingPrice, Comp!ID1, CompQty1, 
ComplD2, CompQty2, ComplD3, CompQty3, ComplD4, CompQty4) 


This table contains repeating attributes, which are not allowed in first normal form. The attributes 
ComponentiID and CompQty are repeated four times. The table is therefore NOT in first normal form. 


It would be represented in standard notation with a line over the repeating attributes: 
Product (ProductlID, ProductName, CostPrice, SellingPrice, ComplID, CompQty) 
To put the data into first normal form, the repeating attributes must be removed. 


Introducing the link table 
At this stage it becomes clear why we need a third table to link the two tables Product and Component. 


Product ~ ProductComp = Component 


The three tables now have attributes as follows: 
Product (ProductiID, ProductName, CostPrice, SellingPrice) 


ProductComp (ProductiID, CompiD, CompQty) 
Component (Comp/D, CompName, SupplierlD, SupplierName) 
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The design is now in 1NF because it contains no repeating attribute or groups of attributes. 


Dealing with a Many-to-Many relationship 


As you get more practice in database design, you will notice that whenever two entities have a 
many-to-many relationship, you will always need a link table ‘in the middle’. Thus: 


Second normal form - Partial key dependence test 


A table is in second normal form (2NF) if it is in first normal form and contains no partial 
dependencies. A partial dependency would mean that one or more of the attributes depends on only 
part of the primary key, which can only occur if the primary key is a composite key. 


will become: 


The only table in which this could arise is ProductComp as this is the only table with a composite 
primary key. However, the only attribute in this table apart from the primary key is CompQty, which 
depends both on both parts of the primary key — which product and which particular component in 
that product. 


The tables are therefore now in second normal form. 


(To demonstrate tables which are not in second normal form, we'll look at Example 2 shortly.) 


Third normal form - Non-key dependence test 


A table is in third normal form (3NF) if it is in second normal form and contains no ‘non-key 
dependencies’. A non-key dependency is one where the value of an attribute is determined by the value 
of another attribute which is not part of the key. 3NF means that: 


All attributes are dependent on the key, the whole key, and nothing but the key. 


Looking at the Component table, the SupplierName attribute is dependent on ComplD and not on the 
SupplierlD. It therefore needs to be removed from this relation and a new relation created. 


The database, now in third normal form, consists of the following tables: 
Product (ProductiID, ProductName, CostPrice, SellingPrice) 
ProductComp (ProductiD, CompiD, CompQty) 

Component (ComplD, CompName, Supplier!D) 
Supplier (SupplierlD, SupplierName) 


: OT 
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The entity relationship diagram showing the relationships between these four tables in third normal form 
is shown below. Each entity has its own table. 


Product = ProductComp § Component 


Supplier 


Example 2 


A school plans to keep records of Sports Day events for different years in a database. The data that 
needs to be held for each event in a particular year is illustrated in the following table: 


E ntip k = 


& 


GA100 |2015 | Girls Under 14 100m | Claire Gordon 610 | 


The entity description is: 


Wi T bia Bit 
| WINnhnet Mer LS! 


ntNam 
' 


Event (EventID, Year, EventName, Winner, TimeOrDistance) 


The composite primary key is composed of EventID and Year. Winner and TimeOrDistance depend on 
the whole key. 


However, EventName depends only on EventlD, not on Year, so this is a partial dependency. This table 
is therefore not normalised. It does not satisfy the requirement of a table in second normal form, namely 
that there are no partial dependencies. 


3 The importance of normalisation 


A normalised database has major advantages over an un-normalised one. 


No data redundancy 


One of the aims of normalising a database design is to remove the possibility of redundant data from any 
of the tables. Redundant data is data that appears in more than one database table, which can cause 
inefficiencies and inconsistencies in the data, as explained in the next paragraph. 


Maintaining and modifying the database 
It is easier to maintain and change a normalised database. 
Data integrity is maintained since there is no unnecessary duplication of data. For example, a customer 
with a particular customer ID will have their personal details stored only once. If the customer changes 


address, the update needs only to be made to a single table, so there is no possibility of inconsistencies 
arising with different addresses for the customer being held on different files. 


It will also be impossible to insert transactions such as details of an order, for a customer who is not 
recorded in the database. 
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Faster sorting and searching 


Normalisation will produce smaller tables with fewer fields. This results in faster searching, sorting and 
indexing operations as there is less data involved. 


A further advantage is that holding data only once saves storage space. 


Deleting records 


A normalised database with correctly defined relationships between tables will not allow records in a 

table on the ‘one’ side of a one-to-many relationship to be deleted accidentally. For example, a customer 
who still has unresolved transactions on file cannot be deleted. This will prevent accidental deletion of a 
customer who has an unpaid invoice recorded, for example. cA] 


Exercises 


1. The publisher of several magazines has a relational database in which the details of each magazine 
are held. One of the tables in the database holds details of all the major articles in each magazine. 


(a) Write a description for entities Magazine and Article, showing for each table the primary key, a 
foreign key if applicable, and at least two other attributes, using the format 


EntityName (primary key, attrioute1, attribute2, attribute3... foreign key) [6] 


(bo) Suggest, with a reason, an attribute in either table which it would be useful to define as a 
secondary key. [2] 


cad 


2. Acollege department wishes to create a database to hold information about students and the 
courses they take. The relationship between students and courses is shown in the following entity 
relationship diagram. 


Each course has a tutor who is in charge of the course. 


Sample data held on the database is shown in the table below. 


Student Student DateOfBirth | Gender Course CourseName TeacherlD Teacher 
Number | Name Number Name 


es ee aS ae 


2222 — F | 12-08-1997 Java mean 
Intro to OOP Ross,M 
Animation Day,S 
3333 BehrK | 31-07-1996 COMP16 _ | Intro to OOP 2299 Ross,M 
COMP34 | Database acini 3370 Blaine, N 


(a) Show how the data may be rearranged into relations which are in third normal form. 


(bo) State two properties that the tables in a fully normalised database must have. [2] 
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3. A museum has permanent displays but also runs a programme of special events. People may pay 

an annual fee to become Friends of the Museum. Friends can attend events, which they must book 
in advance. This, and other data about the museum, is stored in a relational database. Part of the 
entity-relationship (E-R) diagram is shown. 


FRIEND 4 tickeT § EVENT 


(a) (i) State the type of relationship between FRIEND and TICKET. [1] 
(i) Explain the use of primary and foreign keys in FRIEND and TICKET. [4] 


(b) When the database was being designed, an initial version of the diagram showed a direct 
relationship between FRIEND and EVENT. 


Draw this initial E-R diagram with FRIEND and EVENT only. [1] 
Explain why TICKET was inserted. [3] 
cA] OCR F453-07 Qu 9 June 2013 
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Chapter 18 — Introduction to SQL 


Objectives 
@ + Be able to use SQL to retrieve data from multiple tables of a relational database 
@ + Be able to interpret and modify SQL 


SQL 
SQL, or Structured Query Language (pronounced either as S-Q-L or Sequel) is a declarative 


language used for querying and updating tables in a relational database. It can also be used to create 
tables. In this chapter, we will look at SQL statements used in querying a database. 


The tables shown in Tables 18.1, 18.2 and 18.3 below will be used to demonstrate some SQL 
statements. The tables are part of a database used by a retailer to store details of CDs in a database that 
will allow information about the CDs to be extracted. The four entities CD, CDSong, Song and Artist are 
connected by the following relationships: 


Figure 18.1 
The CD table is shown below. 


| DatePublished =i 
06/05/201 4 
24/03/2015 
11/10/2015 


Table 18.1: CD table 
SELECT .. FROM .. WHERE 


The SELECT statement is used to extract a collection of fields from a given table. The basic syntax of this 
statement is 


SELECT list the fields to be displayed 
FROM list the table or tables the data will come from 
WHERE list the search criteria 


ORDER BY __iist the fields that the results are to be sorted on (default is Ascending order) 
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: Example 1 
: SELECT CDTitle, RecordCompany, DatePublished 
FROM CD 
WHERE DatePublished BETWEEN #01/01/2015# AND #31/12/2015# 


ORDER BY CDTitle 


This will return the following records: 


Night Turned Day 24/03/2015 


: Conditions 


Conditions in SQL are constructed from the following operators: 


Equal to CDTitle = “Autumn” as apne reemne 
use single or double quotes 
The date is enclosed in 

Greater than DatePublished > #01/01/2015# | quote marks or, in Access, 


DatePublished < #01/01/2015# 
_ t= |Notequalto | RecordCompany != “ABC” 
e 


Greater than or equal to | DatePublished >= #01/01/2015# 
DatePublished <= #01/01/2015# 


L ‘ 
Equal to a value within a | RecordCompany IN (“ABC”, 
set of values “DEF") 
Finds Shadows (wildcard 
LIKE Similar to CDTitle LIKE “S%” operator varies and can 


Within @ range, Including | 7 tePublished BETWEEN 
the two values which 


define the limits 


Both expressions must 
be true for the entire 


If either or both of the 

expressions are true, RecordCompany = “ABC” OR 
the entire expression is | RecordCompany = “DEF” 
judged true. 


NOT Inverts truth — NOT IN (“ABC", 


Equivalent to 
RecordCompany IN 
("ABC t i DEF ") 
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Specifying a sort order 


ORDER BY gives you control over the order in which records appear in the Answer table. If for example 
you want the records to be displayed in ascending order of RecordCompany and within that, descending 
order of DatePublished, you would write, for example: 


SELECT * 

FROM CD 

WHERE DatePublished < #31/12/2015#4 

ORDER BY RecordCompany, DatePublished Desc 


This would produce the following results: 


CD77233 Lucky Me 24/05/2014 


CD77665 Flying High 31/07/2015 
CD19998 Night Turned Day 24/03/2015 


Extracting data from several tables 


So far we have only taken data from one table. The Song and Artist tables have the following contents: 


Table 18.2: Song table 
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JJ 


Table 18.3: Artist table 


Using SQL you can combine data from two or more tables, by specifying which table the data is held in. 
For example, suppose you wanted SongTitle, ArtistName and MusicType for all Art Pop music. When 
more than one table is involved, SQL uses the syntax tablename.fieldname. (The table name is 
optional unless the field name appears in more than one table.) 


SELECT Song.SongTitle, Artist.ArtistName, Song.MusicType 
FROM Song, Artist 
WHERE (Song.ArtistID = Artist.ArtistID) AND (Song.MusicType = "Art Pop") 


The condition Song.ArtistID = Artist.ArtistID provides the link between the Song and 
Artist tables so that the artist's name corresponding to the ArtistID in the Song table can be found in the 
Artist table. This will produce the following results: 


SQL JOIN 


JOIN provides an alternative method of combining rows from two or more tables, based on a common 
field between them. The query above could be written as follows: 


SELECT Song.SongTitle, Artist.ArtistName, Song.MusicType 
FROM Song 

JOIN Artist 

ON Song.ArtistID = Artist.ArtistID 

WHERE Song.MusicType = "Art Pop" 


The fourth table in the database is the table CDSong which links the songs to one or more of the CDs. 
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| $1234 | 
$1258 
$1415 
$1234 
$1389 
$1423 
$1456 
$1256 
$1392 
$1392 
$1234 
$1389 
$1444 
$1256 
$1344 


CD14356 
CD14356 
CD14356 
CD19998 
CD19998 
CD19998 
CD19998 
CD25364 
CD25364 
CD34512 
CD34512 
CD34512 
CD34512 
CD77233 
CD77233 
CD77233 $1399 
CD77233 51456 


Table 18.4: CDSong table 


Example 2 


We can make a search to find the CDNumbers and titles all the CDs containing the song Waterfal!, sung 
by Ju. 


SELECT Song.SongID, Song.SongTitle, Artist.ArtistName, CDSong.CDNumber, 
CD.CcDTitle 
FROM Song, Artist, CDSong, CD 
WHERE CDSong.CDNumber = CD.CDNumber 
AND CDSong.SoengID = Song.SongID 
AND Artist.ArtistID = Song.ArtistID 
AND Song.SongTitle = "Waterfall" 


This will produce the following results: 


t I I 


$1234 Waterfall 


$1234 | Waterfall CD19998 Night Turned Day 
S1294 | Weterfal oD34512 


Note that in the SELECT statement, it does not matter whether you specify Song. SongID or CDSong. 
SongID since they are connected. The same is true of CDSong.CDNumber and CD. CDNumber. 

The Boolean conditions CDSong.SongID = Song.SongIDandArtist.ArtistID = Song. 
ArtistID are required to specify the relationships between the data tables. (See the entity relationship 
diagram in Figure 18.1.) 


: 9g 
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1. Aschool keeps records of school trips on a database. There are four tables on the database 
named PUPIL, TRIP, TEACHER, PUPILTRIP, defined as follows: 


PUPIL (PupillD, PupilSurname, PupilFirstName) 

TRIP (TriplD, Description, StartDate, EndDate, Destination, NumberOfStudents, Teacher|D) 
TEACHER (TeacherlD, Title, FirstName, Surname) 

PUPILTRIP (PupillD, Trip!D) 


(a) Draw an entity relationship diagram showing the relationship between the entities. 
(b} Write SQL statements for each of the following operations: 
(i) find the first name and surname of all pupils who went on a trip with TripID 14. 


(ii) find all the trips for which the teacher with surname “Black” has been in charge, giving 
teacher's title and surname, trip description and start date, sorted in descending order 
of start date. 


(ili) find the firstnames and surnames of all the pupils who went on any trip with “Year 7” in 
the description (e.g. “Year 7 Geography field trip” in May 2015, showing the firstname 
and surname of the teacher in charge. 
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Chapter 19 — Defining and updating tables 
using SQL 


Objectives 


fA) e Be able to use SQL to define a database table 


@ > Be able to use SQL to update, insert and delete data from multiple tables of a relational database 


Defining a database table 


The follawing example shows how to create a new database table. 


Example 1 


Use SQL to create a table named Employee, which has four columns: EmplD (a compulsory int field 
which is the primary key), EmpName (a compulsory character field of length 10), HireDate (an optional 
date field) and Salary (an optional real number field). 


CREATE TABLE Employee 

( 

EmpID INTEGER NOT NULL, PRIMARY KEY, 
EmpName VARCHAR(20) NOT NULL, 
HireDate DATE, 

Salary CURRENCY 

) 


Data types 


Some of the most commonly used data types are described in the table below. (The data types vary 
depending on the specific implementation.) 


r= TS a SS — eee 
Da De 


Da Descri o i on /Exa 


latype 


Length FLOAT (10,2) (maximum number 
FLOAT Number with a floating decimal point of digits is 10 and maximum number after 
decimal point is 2) 


DATE Stores Day, Month, Year values HireDate DATE 
TIME Stores Hour, Minute, Second values RaceTime TIME 


CURRENCY Formats numbers in the currency used in EntryFee £23.50 
your region 


———___— SE  —————————————— amples 
| | | 7 le CHAR(6) _ 
| | Max, ; (25) 
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Altering a table structure 


The ALTER TABLE statement is used to add, delete or modify columns (i.e. fields) in an existing table. 


To add a column (field): 


ALTER TABLE Employee 
ADD Department VARCHAR (10) 


To delete a column: 


ALTER TABLE Employee 
DROP COLUMN HireDate 


To change the data type of a column: 


ALTER TABLE Employee 
MODIFY COLUMN EmpName VARCHAR (30)NOT NULL 


Defining linked tables 
If you set up several tables, you can link tables by creating foreign keys. 
Example 2 


Suppose that an extra table is to be added to the Employee database which lists the training courses 
offered by the company. A third table shows which date an employee attended a particular course. 


Employee §& 4 Course » . Course 


- Attendance J 


The structure of the Employee table is: 


EmplD Integer (Primary key) 
Name 30 characters maximum 
HireDate Date 

Salary Currency 


Department 30 characters maximum 
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The structure of the Course table is: 


CourselD 6 characters, fixed length (Primary key) 
CourseTitle 30 characters maximum (must be entered) 
OnSite Boolean 


The structure of the CourseAttendance table is: 


CourselD 6 characters, fixed length (foreign key) 
EmplD Integer (foreign key} Course ID and EmplD form a composite primary key 
CourseDate Date (note that the same course may be run several times on different dates) 


The CourseAttendance table is created using the SQL statements: 


CREATE TABLE CourseAttendance 

( 

CourselID CHARACTER (6)NOT NULL, 

EmpIbD INTEGER NOT NULL, 

CourseDate DATE, 

FOREIGN KEY CourseID REFERENCES Course(CourselID), 
FOREIGN KEY EmpID REFERENCES Employee (EmpID) 
PRIMARY KEY (CourseID, EmpID) 

) 


Inserting, updating, and deleting data using SQL 
The SQL INSERT INTO statement 
This statement is used to insert a new record in a database table. The syntax is: 


INSERT INTO tableName (columnl, column2, ..) 
VALUES (valuel, vaiue2, ..) 
Example: add a record for employee number 1122, Bloggs, who was hired on 1/1/2001 for the technical 
department at a salary of £18000. 


INSERT INTO Employee (EmpID, Name, HireDate, Salary, Department) 
VALUES {"1122", "Bloggs", #1/1/2001#, 18000, "Technical") 


Note that if all the fields are being added in the correct order you would not need the field names in the 
brackets above to be specified. INSERT INTO Employee would be sufficient 


Example: add a record for employee number 1125, Cully, who was hired on 1/1/2001. Salary and 
Department are not Known. 


INSERT INTO Employee (EmpID, Name, HireDate) 
VALUES ("1125", "Cully", #1/1/2001#) 
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T 


he SQL UPDATE statement 


This statement is used to update a record in a database table. The syntax is: 


. 
» 
. 
. 


UPDATE tableName 
SET columnl = valuel, column2 = valueZ, ... 
WHERE columnX = value 


Example: increase all salaries of members of the Technical department by 10% 


UPDATE Employee 
SET Salary = Salary*1.1 
WHERE Department = "Technical" 


Example: Update the record for Bloggs, who has moved to Administration. 


UPDATE Employee 
SET Department = “Administration” 
WHERE EmpID = "1122" 


The SQL DELETE statement 
This statement is used to delete a record from a database table. The syntax is: 


DELETE FROM tableName 
WHERE columnX = value 


Example: Delete the record for Bloggs. 


DELETE FROM Employee 
WHERE EmpID = "1122" 
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A-Level only 


1. Acar dealer accepts orders for new vehicles from its customers, and puts in an order to the 
manufacturer for the customised vehicle(s), There may be more than one vehicle on the customer 
order if for example a company is replacing its fleet of hire cars. When a car arrives, a member of staff 
telephones or emails the customer to inform them that it is ready for collection. 


Exercises 


Details of the vehicles, customers and orders are to be stored in a relational database using the 
following four relations: 


Vehicle (VehiclelD, VehicleName, Model, Price, SupplierName} 
CustomerOrder (OrderlD, CustomerlD, Date) 
CustomerOrderLine (OrderlD, VehiclelD) 


Customer (CustomerID, CustomerName, EmailAddress, TelephoneNumber) 


2 


These relations are in Third Normal Form (SNF). 
(i) What does this mean? [2] 


(i) Why is it important that the relations in a relational database are in Third Normal Form’? [2] 


S 


On the incomplete entity relationship diagram below show the degree of any three relationships 
that exist between the entities. [3] 


oS 


Complete the following SQL statement to create the Vehicle relation, including the key field. 


CREATE TABLE Vehicle ( [3] 


fo) 


A fault has been identified with all cars of Model 10765. The manager needs 

a list of the names and telephone numbers of all the customers who have purchased this 
type of car so that they can be contacted and the car recalled for modification. This list 
should contain no additional details and must be presented in alphabetical order of the 
names of the customers. 


Write an SQL query that will produce this list. [6] 
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Chapter 20 — Transaction processing 


Objectives 


« Describe methods of capturing, selecting, managing and exchanging data 


A) ¢ Describe what is meant by transaction processing and ACID (Atomicity, Consistency, 
Isolation, Durability) 


A) * Describe what is meant by record locking and why it is necessary in a multi-user database 


A) e Describe what is meant by redundancy 


Capturing data 


Before data is added to a database, it has to be captured or input by some means or other. Manual 
methods include transcribing data from a form that has been filled in, for example by a customer ordering 
items from a catalogue or a market researcher filling in forms on the High Street. 


Cheques paid in at a bank are scanned using magnetic ink character recognition (MICR); the bank 
number, customer account number and cheque number are printed in special magnetic ink along the 
bottom of the cheque. The amount of the cheque has to be manually entered by the bank clerk. 


Some forms such as lottery tickets, multiple choice questionnaires or exams may be read using optical 
mark recognition (OMR), and other types of form using OCR Optical Character Recognition.) 


Other automated methods include smart card readers, scanners such as those used at airports to scan 
passports and barcode readers or scanners. 


CHAPTER 20 — TRANSACTION PROCESSING 


Selecting and managing data 


Data may be selected before it is even added to a database, depending on whether or not it 
matches specified criteria. For example, a speed camera may automatically photograph only 
those vehicles which are exceeding the speed limit. 


Once in the database, SQL may be used to select data from different tables which match 
required criteria. Using the selected data, reports may be produced, letters sent out by post 
or email, new stock items automatically re-ordered, records added, updated or deleted. 


Exchanging data 


A common method of transferring data between one computer system and another (usually via the 
Internet) without the need for human intervention is EDI (Electronic Data Interchange). Using standardised 
message formatting, documents can be exchanged electronically. Transaction software processes 

the information and the software on the receiving end looks up details of, for example, items to be 
purchased, price, buyer's name and address etc. in an order processing system. 


EDI can be used in countless different applications, such as by Exam Boards to send results to schools, 
or by insurance companies to check that an applicant has a driver's licence. 


Transaction processing and ACID A-Level only 


In the context of databases, a single logical operation on data is defined as a transaction. For example, 
a customer booking a cinema ticket, and making an online payment using a credit card, is a single 
transaction even though it involves multiple actions. 


The database system has to ensure that it is not possible to complete only part of a transaction, for 
example booking the cinema ticket without paying for it. ACID (Atomicity, Consistency, Isolation, 
Durability) is a set of properties that guarantees that transactions are processed reliably. 


Atomicity 


Atomicity requires that a transaction must be processed in its entirety or not at all. Atomicity must 
guarantee that in any situation, including power cuts or hard disk crashes, it is not possible to process 
only part of a transaction. 


Consistency 


The consistency property ensures that no transaction can violate any of the defined validation rules for 
maintaining the integrity of the database. When a database is created, referential integrity rules will be 
specified between linked tables (see Chapter 16). Thus it will not be possible, for example, to record a 
mark in a RESULTS table for a student who is not in the STUDENT table in the database. Similarly, it will 
not be possible to delete a record from the STUDENT table if they have marks on the RESULTS table. 


Isolation 


The isolation property ensures that concurrent execution of transactions leads to the same results as if 
transactions were processed one after the other. 


Durability 


The durability property ensures that once a transaction has been committed, it will remain so, even 

in the event of a power cut. For example, if the online sale of a cinema ticket is in the process of being 
completed, it should not be possible for the number of seats sold to be updated but the customer's debit 
card not processed. As each part of the transaction is completed, it is held in a buffer on disk until all 
elements of the transaction are completed. Only then will the changes to the database tables be made. 
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- Potential problems with multi-user databases 


Allowing multiple users to simultaneously update a database table may cause one of the updates to be 
lost unless measures are taken to prevent this. 


When an item is updated, the entire record (indeed the whole block in which the record is physically 
held) will be copied into the user's own local memory area at the workstation. When the record is saved, 
the block is rewritten to the file server. Imagine the following situation: 


User A accesses a customer record, thereby causing it to be copied into the memory at his/her 
workstation, and starts to type in a new address for the customer. 


User B accesses the same customer record, and alters the credit limit and then saves the record and 
calls up the next record that needs updating. 


User A completes the address change, and saves the record. 


There are several methods which may be employed to avoid updates being lost. 


Record locks 


Record locking is the technique of preventing simultaneous access to objects in a database in order to 
prevent updates being lost or inconsistencies in the data arising. In its simplest form, a record is locked 
whenever a user retrieves it for editing or updating. Anyone else attempting to retrieve the same record is 
denied access until the transaction is completed or cancelled. 


Problems with record locking 


If two users are attempting to update two records, a situation can arise in which neither can proceed, 
known as deadlock. Suppose a bank clerk is updating Customer A's record with a transfer to Customer 
B’s account. Meanwhile a second bank clerk is trying to update Customer B's record, as he needs to 
transfer money to Customer A's account. 


User User2 
locks Customer A’s record locks Customer B's record 
tries to access Customer B’s record tries to access Customer A's record 
waits .. waits .. 
DEADLOCK! 


The DBMS must recognise when this situation has occurred and take action. Serialisation, timestamp 
ordering or commitment ordering may be used. 


Serialisation 
This is a technique which ensures that transactions do not overlap in time and therefore cannot interfere 
with each other or lead to updates being lost. A transaction cannot start until the previous one has 
finished. It can be implemented using timestamp ordering. 
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Timestamp ordering 


Whenever a transaction starts, it is given a timestamp, so that if two transactions affect the same object 
(for example record or table), the transaction with the earlier timestamp should be applied first. 


In order to ensure that transactions are not lost, every object in the database has a read timestamp and 
a write timestamp, which are updated whenever an object in a database is read or written. 


When a transaction starts, it reads the data from a record causing the read timestamp to be set. When 

it writes the updated data back to the record it will check the read timestamp. If this is not the same as 
the value that was saved when this transaction started, it will know that another transaction is also taking 
place on the record. A range of potential problems can thus be identified and avoided. 


Commitment ordering 


This is another serialisation technique used to ensure that transactions are not lost when two or more 
users are simultaneously trying to access the same database object. Transactions are ordered in terms 
of their dependencies on each other as well as the time they were initiated. It can be used to prevent 
deadlock by blocking one request until another is completed. 


Redundancy 


Very many organisations such as banks, airport systems, hospitals, and others cannot afford to have their 
computer systems go down even for a few seconds, with consequent loss of transaction data. These 
organisations maintain two or even three identical systems in different geographical locations, so that 
every transaction is written to two or three different storage facilities. This built-in hardware redundancy 
protects agianst loss of data in the event of power failure or other disasters. 


If one system fails, the backup system automatically takes over and processing can continue. 


Exercises 


1. (a) Explain how, in a client-server database with multiple users, an update made by one user 
may not be recorded if the database management system does not have measures in place 


to ensure the integrity of the database. [3] 
(o) Explain what is meant by deadlock and how this can arise. [2] : 
(6) Name and describe briefly a method of preventing this from happening. [2] 
2. (a) Describe what is meant by referential integrity in a database. [2] : 
(0) Describe what is meant by the ACID model in database theory. [6] 
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Chapter 21 — Structure of the Internet 


Objectives 


e Understand the structure of the Internet 

e Describe the term ‘Uniform Resource Locator’ in the context of networking 

e Understand the purpose and function of the Domain Name Server (DNS) system 
« Explain the terms ‘domain name’ and ‘IP address’ 

*« Describe how domain names are organised 


* Describe the characteristics of LANs and WANs 


A short history of the Internet and the World Wide Web 


The Internet is a network of networks set up to allow computers to communicate with each other globally. 
A United States defence project in the 1960s (ARPA) created ARPANET to enable distant departments 
working on the same project to communicate without the need for physical travel. The project developed, 
as did their means of communication and the Internet idea was born. In 1995 the Internet became a 
public hit when the World Wide Web emerged and user numbers began to climb, reaching 2.5 billion 
users worldwide in 2015 — roughly one third of the world’s population. The World Wide Web (\VWVW) is 

a collection of web pages that reside on computers connected to the Internet. It uses the Internet as a 
service to communicate the information contained within these pages. The concept of the WWW and 
using a browser to search the information contained within it was first developed by Sir Tim Berners- 
Lee, a British scientist working at CERN in Geneva, Switzerland. The World Wide Web is not the same as 
the Internet and even today, the Internet is frequently used without using the WWW. 


Global Internet users (1995 - 2015) 


gees Internet users (Milions) Percentage of the word's population 


Intemet users (Millions) 
3 
a 
Percentage of world's population 


1995 2000 2005 2010 2015 
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The physical structure of the Internet 


Each continent uses backbone cables connected by trans-continental leased lines fed across the sea 
beds. National Internet Service Providers (ISPs) connect directly to this backbone and distribute the 
Internet connection to smaller providers who in turn provide access to individual homes and businesses. 


Trans-continental Internet connections, TeleGeography 


Uniform Resource Locators (URLs) 


A Uniform Resource Locator is the full address of an Internet resource. It specifies the location of a 
resource on the Internet, including the resource name and usually the file type, so that a browser can 
request it from the website server. 


Method Host Location Resource 


http:/Awww.domainname.com/folder/subfolder/webpage.htmi#element 


= ee 
URL 


Internet registries and registrars 


Internet registrars hold records of all existing website names and the details of those domains that are 
currently available to purchase. These are companies that act as resellers for domain names and allow 
people and companies to purchase them. All registrars must be accredited by their governing registry. 


Internet registries are five global organisations governed by the Internet Corporation for Assigned 
Names and Numbers (ICANN) with worldwide databases that hold records of all the domain names 
currently issued to individuals and companies, and their details. These details include the registrant's 
name, type (company or individual), registered mailing address, the registrar that sold the domain name 
and the date of registry. The registries also allocate IP addresses and keep track of which address(es) a 
domain name is associated with as part of the Domain Name System (DNS). 
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werw.ripesnet +p 


Domain names and the Domain Name System (DNS) 


A domain name identifies the area or domain that an Internet resource resides in. These are structured 
into a hierarchy of smaller domains and written as a string separated by full stops as dictated by the rules 
of the Domain Name System (DNS). 


<root> 


Generic TLDs (om) (<a) (0) a §=6Country TLDs 


2LDs 


.bbe ebay Jlidl 3LDs 


A hierarchical domain system from Top Level Domains (TLDs) to 3rd Level Domains (3LDs) 
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Each domain name has one or more equivalent IP addresses. The DNS catalogues all domain names 
and IP addresses in a series of global directories that domain name servers can access in order to find 
the correct IP address location for a resource. When a webpage is requested using the URL a user 
enters, the browser requests the corresponding IP address from a local DNS. If that DNS does not 
have the correct IP address, the search is extended up the hierarchy to another larger DNS database. 
The IP address is located and a data request is sent by the user's computer to that location to find the 
web page data. A webpage can be accessed within a browser by entering the IP address if it is known. 
Try entering 74.125.227.176 into a browser. 


Host Website Company 2LD Country TLD 


A 


ski secteattouana.so.uk 


en 
Website domain name 


Fully Qualified Domain Name (FQDN) 


Fully Qualified Domain Name (FQDN) 


A fully qualified domain name is one that includes the host server name, for example www, mail or 
ftp depending on whether the resource being requested is hosted on the web, mail or ftp server. This 
would be written as www.websitename.co.uk or mail.website.co.uk for example. 


IP addresses 


An IP or Internet Protocol address is a unique address that is assigned to a network device. An IP 
address performs a similar function to a home mailing address. 


130.142.37.108 


The IP address indicates where a packet of data is to be sent or has been sent from. Routers can use 
this address to direct the data packet accordingly. If a domain name is associated with a specific IP 
address, the IP address is the address of the server that the website resides on. 


Wide Area Networks (WANs) 


As a network of inter-connected networks, the Internet comprises millions, if not billions of Local Area 
Networks and individual users to form the world's largest Wide Area Network. 


A Wide Area Network is generally defined to be one that relies on third party carriers or connections 
such as those provided by British Telecom. WANs are typically spread over a large geographical area, 
even across continents. 


Local Area Networks (LANs) 


A Local Area Network consists of a number of computing devices on a single site or in a single 
building, connected together by cables. The network may consist of a number of PCs, other devices 
such as printers and scanners, and a central server. Users on the network can communicate with each 
other, as well as sharing data and hardware devices such as printers and scanners. 


LANs can transmit data very fast but only over a short distance. 
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Physical bus topology 


A LAN can use different layouts or topologies. In a bus topology, all computers are connected to a single 
cable. The ends of the cable are plugged into a terminator. 


File/Print Printer 
server 


Computer Computer 


Advantage of a bus topology 


* Inexpensive to install as it requires less cable than a star topology and does not require any 
additional hardware 


Disadvantages of a bus topology 


e Ifthe main cable fails, network data can no longer be transmitted to any of the nodes 
*« Performance degrades with heavy traffic 


* Low security - all computers on the network can see all data transmissions 


Physical star topology 


A star network has a central node, which may be a switch or computer which acts as a router to 
transmit messages. A switch keeps a record of the unique MAC address (see chapter 22) of each device 
on the network and can identify which particular computer on the network it should send the data to. 


Computer 


Computer Computer 


Computer Computer 
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Advantages of a star topology 
e If one cable fails, only one station is affected, so it is simple to isolate faults 
¢ Consistent performance even when the network is being heavily used 
¢ Higher transmission speeds can give better performance than a bus network 
¢ No problems with ‘collisions’ of data since each station has its own cable to the server 


« The system is more secure as messages are sent directly to the central computer and cannot be 
intercepted by other stations 


*« Easy to add new stations without disrupting the network 


Disadvantages of a star network 
¢ May be costly to install because of the length of cable required 


e Ifthe central device goes down, network data can no longer be transmitted to any of the nodes 


Physical vs logical topology 


The physical topology of a network is its actual design layout, which is important when you select a 
wiring scheme and design the wiring for a new network. 


The logical topology is the shape of the path the data travels in, and describes how components 
communicate across the physical topology. The physical and logical topologies are independent of each 
other, so that a network physically wired in star topology can behave logically as a bus network by using 
a bus protocol and appropriate physical switching. 


For example, any variety of Ethernet uses a logical bus topology when components communicate, 
regardiess of the physical layout of the cable. 


Wi-Fi 


Wi-Fi is a local area wireless technology that enables you to connect a device such as a PC, smartphone, 


digital audio player, laptop or tablet computer to a network resource or to the Internet via a wireless 
network access point (WAP). An access point has a range of about 20 metres indoors, and 
more outdoors. 


In 1999, the Wi-Fi Alliance was formed to establish international standards for interoperability and 
backward compatibility. The Alliance consists of a group of several hundred companies around the world, 
and enforces the use of standards for device connectivity and network connections. 
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Wireless Access Point (WAP) 


In order to connect to a wireless network, a computer device needs a wireless network adaptor. 

The combination of computer and interface controller is called a station. All stations share a single radio 
frequency communication channel, and each station is constantly tuned in on this frequency to pick up 
transmissions. Transmissions are received by all the stations within range of the wireless access point. 


To connect to the Internet, the WAP usually connects to a router, but it can also be an integral part of 
the router itself. 


a) )) Cem 


Laptop Wireless Printer 
computer access point 


A laptop connected wirelessly to a printer 


Mesh network topologies 


Mesh networks are becoming more common with the widespread use of wireless technology. Each node 


in a mesh network has a connection to every other node, by transmitting data across any intermediate 
nodes. Only one node requires a connection to the Internet and all others can share this connection. 
Mesh networks can quickly become big enough to cover entire cities. 


Internet 
connection 


Advantages of a wireless mesh network 


The advantages of a mesh network include: 
No cabling costs 


The more nodes that are installed, the faster and more reliable the network becomes, since one 
blocked or broken connection (as shown above) can easily be circumvented by another route. In this 
respect, the mesh topology can be described as ‘self healing’. 


New nodes are automatically incorporated into the network 
Faster communication since data packets do not need to travel via a central switch 
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Exercises 


y Fe 


2. 


A Uniform Resource Locator (URL) is the address of a resource on the Internet. For example, 
http://www. pgonline.co.uk/courses/alevel/computing_test.html. 


Explain the different parts of the address. 
(a) www. 

(b} pgonline.co.uk 

(c) /courses/alevel/computing_test.html| 


A village hall committee is considering purchasing a lease on a web domain to set up a new 
website to advertise their events. They have been advised to contact an Internet registrar. 


(a) Explain the role of an Internet registrar. 

(b) What is the primary role of an Internet Service Provider (ISP)? 
Mahmood wants to create a small office network for a home enterprise. 
(a) Describe what is meant by a LAN. 


(b) Suggest two items of hardware that would be required to create a wireless LAN. 


CHAPTER 22 — INTERNET COMMUNICATION 


Chapter 22 — Internet communication 


Objectives 


« Describe circuit switching and packet switching 
e Understand the role of packet switching and routers 
@* Understand the function of network hardware devices 
e Understand the importance of protocols and standards 
« Describe the roles of the four layers in the TCP/IP protocol stack 
* Be familiar with transferring files using FTP as an anonymous and non-anonymous user 


e« Explain the role of an email server in sending and retrieving email 


Circuit switching 


Circuit switching creates a direct link between two devices for the duration of the communication. 
The public telephone system is an example of a circuit switched network. When a caller dials a 
number, various switches in telephone exchanges set up a path between the caller and the recipient. 
The connection is set up for the entire duration of the call including periods of silence and pauses. 
This enables two people to hold a call without any delay in the delivery of speech. 


If two computers use the circuit switching principle, bandwidth is wasted during the periods when no 
data is being sent. The two devices must also transmit and receive data at the same rate, so circuit 
switched networks can only connect computers or devices that operate at the same transfer rate. 

On the other hand, since this is an exclusive connection between the two devices for the duration of the 
communication, data segments (or packets) arrive in the same order that they are sent, simplifying the 
process of reconstructing the message at the recipient end. 


Because switches are used to connect and disconnect the circuits, electrical interference is produced 
and although this is not a serious problem for speech, it may produce corrupt or lost data if the path is 
being used to transmit data. If this is likely to be a problem, a leased line may be used instead. 


Packet switching 


Packet switching is a method of communicating packets of data across a network on which other 
similar communications are happening simultaneously. Website data that you receive arrives as a series 
of packets and an email will leave you in a series of packets. 


Data packets 


Data that is to be transmitted across a network is broken down into more manageable chunks called 
packets. The size of each packet in a transmission can be fixed or variable, but most are between 500 
and 1500 bytes. Each packet contains a header and a payload containing the body of data being sent. 
Some packets may also use a trailer section with a checksum or Cyclical Redundancy Check (CRC) 
to detect transmission errors by creating and attaching a hash total calculated from the data contained 
in the packets. In essence, this hash total commonly involves adding up the total number of 1s in the 
transmission. The CRC checksum is recalculated for each packet upon receipt and matched to help 
verify that the payload data has not changed during transmission. If the CRC totals differ, the packet is 
refused with suspected data corruption and a new copy is requested from the sender. 
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The header (much like the box(es) of a consignment you might send or receive through the post) includes 
the sender's and the recipient’s IP addresses, the protocol being used with this type of packet and the 
number of the packet in the sequence being sent, ¢.g. packet 1 of 8. They also include the Time To Live 
(TTL) or hop limit, after which point the data packet expires and is discarded. 


Payload Payload ; Payload 


Japeay 


Japeay 


Trailer 


Packet 3 of 3 Packet 2 of 3 Packet 1 of 3 


Data packets queueing to be sent 


The payload of the packet contains the actual data being sent. Upon receipt, the packets are 
reassembled in the correct order and the data is extracted. 


Routing packets across the Internet 


The success of packet switching relies on the ability of packets to be sent from sender to recipient along 
entirely separate routes from each other. At the moment that a packet leaves the sender's computer, the 
fastest or least congested route is taken to the recipient's computer. They can be easily reassembled in 
the correct order at the receiving end and any packets that don't make it can be requested again. 
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A-Level only 


Each node in the diagram above represents a router. Routers are used to connect at least two networks, 
commonly two LANs or WANs, or to connect a LAN and its ISP’s network. The act of traversing between 
one router and another across a network is referred to as a hop. The job of a router is to read the 


Routers 


recipient's IP address in each packet and forward it on to the recipient via the fastest and least congested 
route to the next router, which will do the same until the packet reaches its destination. Routers use 
routing tables to store and update the locations of other network devices and the most efficient routes 

to them. A routing algorithm is used to find the optimum route. The routing algorithm used to decide 

the best route can become a bottleneck in network traffic since the decision making process can be 
complicated. A common shortest path algorithm used in routing is Dijkstra’s algorithm. (See Chapter 64.) 


When a router is connected to the Internet, the IP address of the port connecting it must be registered 
with the Internet registry because this IP address must be unique over the whole Internet. 


Gateways 
Routing packets from one network to another requires a router if the networks share the same protocols, for 
example TCP/IP. Where these protocols differ between networks, a gateway is used rather than a router 
to translate between them. All of the header data is stripped from the packet leaving only the raw data and 
new data is added in the format of the new network before the gateway sends the packet on its way again. 
Gateways otherwise perform a similar job to routers in moving data packets towards their destination. 


Media Access Control (MAC) addresses 


Every computer device, whether it's a PC, sm nd rone, laptop, printer or other device which is capable 
of being part of a network, must have a wired or wireless Network Interface Card (NIC). Each NIC has a 
unique Media Access Control address ae address), which is assigned and hard-coded into the card 
y the manufacturer and which uniquely identifies the device. The address is 48 bits long, and is written 
as 12 hex digits, fo 


00-09-5D-E3-F7-62 


You can find out the MAC address of your PC by selecting Command Prompt from the Start menu in 


Windows, and then typing ipconfig /all. This will display the physical address, i.e. MAC address. 


hipconfig “all 


Window IF Configuration 


Host Mame RO este 
I nary Done Suffix TORI, ee ee ge 
a) T wipe ae = ae : Mixed 


IP Rouwting Enabled. Be ge ly . = Wo 
WINS Proxy Enabled. . . .... « > Ho 


Ethernet adapter Bluetooth Metwork Connect ion: 


Media State... me > Media di mnnected 
Connect ior :peci c DNS Suf Fix : 
Description . . .... + »« « « « = Bluetooth Device (Personal 


DHCP Enabled. ST ae ae al 
Autoconfiguration Enabled 


Displaying your computer's MAC address 
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The importance of protocols and standards 


A protocol is a set of rules defining common methods of data communication. These rules need to 
be standard across all devices in order for them to communicate with each other. HTTP (HyperText 
Transfer Protocol) has become the standard protocol for browsers to render web pages. TCP/IP is 
also used worldwide and enables communication with any other computer connected to the Internet 
regardless of its location. 


The TCP/IP protocol stack 


The Transmission Control Protocol / Internet Protocol (TCP/IP) protocol stack is set of networking 
protocols that work together as four connected layers, passing incoming and outgoing data packets up 
and down the layers during network communication. 


There are four layers: 
e Application layer 
e Transport layer 


e Internet layer 


* Link layer 
Terminal A Terminal B 
Application Router Router Application 
| | Transport Transport 
e Internet Internet Internet Internet 


——7 
Link 


Figure 22.1 The TCP/IP protocol stack 


The role of the four layers in the stack 


Various protocols operate at each layer of the stack, each with different roles. In each layer, the data 
to be sent is wrapped, or encapsulated in an envelope containing new packet data as it descends the 


layers and is unwrapped again at the receiving end in a networking equivalent of a game of pass 
the parcel. 


The application layer 


The application layer sits at the top of the stack and uses protocols relating to the application being 
used to transmit data over a network, usually the Internet. If this application is a browser, for example, it 
would select an appropriate higher-level protocol for the communication such as HTTP, POPS or FTP. 


Imagine the following text data is to be sent via a browser using the Hypertext Transfer Protocol (HTTP): 


“Only two things are Infinite, the universe and human stupidity, and I'm not sure about the former.” 


Albert Einstein 
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The transport layer 


The transport layer uses the Transmission Control Protocol (TCP) to establish an end-to-end 
connection with the recipient computer. The data is then split into packets and labelled with the packet 
number, the total number of packets and the port number through which the packet should route. This 
ensures it is handled by the correct application on the recipient computer. In the example below, port 80 
is used as this is a common port used by the HTTP protocol, called upon by the destination browser. 


lf any packets go astray during the connection, the transport layer requests retransmission of lost 
packets. Receipt of packets is also acknowledged. 


Packetiof3 | Packet 2 of 3 


“Only two things the universe and 


Packet 3 of 3 


and I'm not sure 


are Infinite, human stupidity, 


about the former.” 


The Internet layer 


The Internet layer adds the source and destination IP addresses. Routers operate on the network 
layer and will use these IP addresses to forward the packets on to the destination. The addition of an IP 
address to the port number forms a socket, ¢€.g. 42.205.110.140:80, in the same way that the addition 
of a person's name is added to a street address on an envelope in order to direct the letter to the correct 
person within a building. A socket specifies which device the packet must be sent to and the application 
being used on that device. 


127.61.210.88 
42.205.110.140 


Packet 1 of 3 


“Only two things 
are Infinite, 


127.61.210.88 127.61.210.88 
42.205.110.140 42.205.110.140 


Packet 2 of 3 Packet 3 of 3 


the universe and and I'm not sure 
human stupidity, about the former.” 


Port: 80 Port: 80 


The link layer 


The link layer is the physical connection between network nodes and adds the unique Media Access 
Control (MAC) addresses identifying the Network Interface Cards (NICs) of the source and destination 
computers. These means that once the packet finds the correct network using the IP address, it can then 
locate the correct piece of hardware. The destination MAC address is that of the device that the packet is 
being sent to next. Unless the two computers are on the same network, the destination MAC address will 
initially be the MAC address of the first router that the packet will be sent to. 


38:B2:5A4:78:E4:19 38:B2:54:78:E4:19 38:B2:5A:78:E4:19 

4A:62:BB:F2:09:10 44:62:BB:F2:09:10 44:62:BB:F2:09:10 
127.61.210.88 127.61.210.88 127.61.210.88 
42.205.110.140 42.205.110.140 42.205,110.140 


Packet 1 of 3 Packet 2 of 3 Packet 3 of 3 


“Only two things the universe and and I'm not sure 
are Infinite, human stupidity, about the former.” 
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At the receiving end, the MAC address is stripped off by the link layer, which passes the packets on 

to the Internet layer. The IP addresses are then removed by the Internet layer which passes them on 

to the transport layer to remove the port numbers and reassemble the packets in the correct order. 

The resulting data is then passed to the application which presents the data for the user. Since routers 
operate on the Internet layer, source and destination MAC addresses are changed at each router node. 
Packets, therefore, move up and down the lower layers in the stack as they pass through each router or 
switch between the client and the server as shown in Figure 22.1. 


Transferring files with FTP 


File Transfer Protocol (FTP) is a very efficient method used to transfer data across a network, often 
the Internet. FTP works as a high level protocol in the Application layer using appropriate software. 
The user is presented with a file management screen showing the file and folder structure in both the 
local computer and the remote website. Files are transferred simply by dragging them from one area 
to the other. FTP sites may also be used by software companies offering large updates, or by press 
photographers to upload their latest photographs to a remote newspaper headquarters, for example. 
Most FTP sites require a username and password to authenticate the user, but some sites could be 
configured to allow anonymous use without the need for any login information. 


File Edit View Transfer Server Help 
@: it /Q)H%O@R + #8 
| Host: | ftp.host Username: username Password: eeccceces 


Command: FWD 
Response: 257 "is the current directory 
ornnnd: PaASsY 
227 Entering Passive Mode 
LIST 
150 Opening ASCII mode data connection for File list 
226 Transfer complete 
Directory listing successful 
Retrieving directory isting... 
COUP 
250 COUP command successful 
PWO 
257 " ls the current directory 
: Directory listing successful 
ic: \\ WordPress’ 
| 4 System 
ens 
: GQ Program Files 
: 49 WINDOWS 
WordPress 
® (Cy plugins 
459 Wordpress 2.6 


| Queued files | Failed transfers. | Successfultransters | 


FileZilla - Open source FIP software 
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The role of a mail server in retrieving and sending email 


A mail server acts as a virtual post office for all incoming and outgoing emails. These servers route mail 
according to its database of local network user's email addresses as it comes and goes, and store it until 
it can be retrieved. Post Office Protocol (v3) (POP3}) is responsible for retrieving emails from a mail 
server that temporarily stores your incoming mail. When emails are retrieved, they are transferred to your 
local computer, be it a desktop or mobile phone, and deleted from the server. As a result, if you are using 
different devices to access email via POPS, you will find that they don't synchronise the same emails on 
each device. Internet Message Access Protocol (IMAP) is another email protocol that is designed 

to keep emails on the server, thus maintaining synchronicity between devices. Simple Mail Transfer 
Protocol (SMTP) is used to transfer outgoing emails from one server to another or from an email client 
to the server when sending an email. 


Computer Mail server Internet Mail server Mail server Computer 


Exercises 


1. All Internet connected devices communicate via the TCP/IP protocol stack. This has four 
layers — the application, transport, Internet and link layers. 


(a) Describe the roles of each layer when two devices are communicating over the Internet. [8] 
(b) (i) Give the names of one piece of network hardware that operates on the Internet layer. [1] 
(ii) Give the names of one piece of network hardware that operates on the Link layer. [1] 


2. Major parts of the Internet run on a packet switched network that relies on routers and 
gateways to communicate. 


(a) What is meant by the term packet switching? [2] 


(b) A data packet contains a header and a payload. The header contains data that it used to 
route the packet to its destination. 


State three cata items that might be contained in a data packet's header. [3] 


(c) Explain the difference between a router and a gateway. [2] 
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A-Level only 


Chapter 23 — Network security and threats 


- Objectives 
~ @ = Discuss network security and threats 
? A) e Discuss use of firewalls, proxies and encryption 


A) e Discuss worms, Trojans and viruses and the vulnerabilities that they exploit 


Firewalls 


A firewall is a security checkpoint designed to prevent unauthorised access between two networks, 
usually an internal trusted network and an external, deemed untrusted, network; often the Internet. 
Firewalls can be implemented in both hardware and/or software. A router may contain a firewall. 


A typical firewall consists of a separate computer containing two Network Interface Cards (NICs), with 
one connected to the internal network, and the other connected to the external network. Using special 
firewall software, each data packet that attempts to pass between the two NICs is analysed against 
preconfigured rules (packet filters), then accepted or rejected. A firewall may also act as a 

proxy server. 


Packet filtering 


Packet filtering, also referred to as static filtering, controls network access according to network 
administrator rules and policies by examining the source and destination IP addresses in packet headers. 
If the IP addresses match those recorded on the administrator's 'permitted’ list, they are accepted. Static 
filtering can also block packets based on the protocols being used and the port numbers they are trying 
to access. A port is similar to an airport gate, where an incoming aircraft reaches the correct airport (the 
computer or network at a particular IP address) and is directed to a particular gate to allow passengers 
into the airport, or in this case to download the packet's payload data to the computer. 


Cllant Server 


192,168,0.2:1040 | — 24.120.63,37:80 


192.168.0.2:468 | _24.120.63.37:23 a pee 
192.168.0.2:14 | 24.120.63.37:67 Established} 120 | 


Certain protocols use particular ports. Telnet, for example, is used to remotely access computers and 
uses port 28. If Telnet is disallowed by a network administrator, any packets attempting to connect 
through port 23 will be dropped or rejected to deny access. A dropped packet is quietly removed, 
whereas a rejected packet will cause a rejection notice to be sent back to the sender. 


CHAPTER 23 — NETWORK SECURITY AND THREATS 


A-Level only 
Proxy servers 


A proxy server intercepts all packets entering and leaving a network, hiding the true network addresses 
of the source from the recipient. This enables privacy and anonymous surfing. A proxy can also maintain 
a cache of websites commonly visited and return the web page data to the user immediately without the 
need to reconnect to the Internet and re-request the page from the website server. This speeds up user 
access to web page data and reduces web traffic. If a web page is not in the cache, then the proxy will 
make a request of its own on behalf of the user to the web server using its own IP address and forward 
the returned data to the user, adding the page to its cache for other users going through the same proxy 
server to access, A proxy server may serve hundreds, if not thousands of users. 


Sener eee eer es ee. 
=. 


= 
= 
- 
- 
- 
. 
= 
LEER PE 


IP Address: 
72.214,61.117 
ee *. 


IP Address: 
210.43.137.40 
+ 


IP Address: 
24,120.63,37 


Proxy servers are often used to filter requests providing administrative control over the content that users 
may demand. A common example is a school web-proxy that filters undesirable or potentially unsafe 
online content in accordance with the school usage policies. Such proxies may also log user data with 
their requests. 


Encryption 


Encryption is one way of making messages travelling over the Internet secure. Different encryption 
methods are covered in Section 4, Chapter 15. 


Worms, Trojans and viruses 


Worms, Trojans and viruses are all types of malware or malicious software. They are all designed to 
cause inconvenience, loss or damage to programs, data or computer systems. 


Viruses and worm subclasses 


Viruses and worms have the ability to self-replicate by spreading copies of themselves. A worm is 

a sub-class of virus, but the difference between the two is that viruses rely on other host files (usually 
executable programs) to be opened in order to spread themselves, whereas worms do not. A worm is 
standalone software that can replicate itself without any user intervention. Viruses come in various types 
but most become memory resident when their host file is executed. Once the virus is in memory, any 
other uninfected file that runs becomes infected when it is copied into memory. Other common viruses 
reside in macro files usually attached to word processing and spreadsheet data files. When the data file 
is opened, the virus spreads to infect the template and subsequently other files that you create. Macro 
viruses are usually less harmful than other viruses but can still be very annoying. 
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A-Level only 


The Cascade virus caused text characters to fall from the top of the screen 


A worm can reside within a data file of another application and will usually enter the computer through a 
vulnerability or by tricking the user into opening a file; often an attachment in an email. Rather than simply 
infecting other files like a virus on your own machine, a worm can replicate itself and send copies to other 
users from your computer; commonly by emailing others in your electronic address book. 


Owing to the ability of a worm to copy itself, worms are often responsible for using up bandwidth, system 
memory or network resources, causing computers to slow and servers to stop responding. 


- Trojans 

A Trojan is so-called after the story of the great horse of Troy, according to which soldiers hid inside a 
large wooden horse offered as a gift to an opposition castle. The castle guards wheeled the wooden 
horse inside their castle walls, and the enemy soldiers jumped out from inside the horse to attack. A 
Trojan is every bit as cunning and frequently manifests itself inside a seemingly useful file, game or 
utility that you want to install on your computer. When installed, the payload is released, often without 
any obvious irritation. A common use for a Trojan is to open a back door to your computer system 
that the Trojan creator can exploit. This can be in order to harvest your personal information, or to use 
your computer power and network bandwidth to send thousands of spam emails to others. Groups of 


Internet-enabled computers used like this are called botnets. Unlike viruses and worms, Trojans cannot 
self-replicate. 


Giovanni Domenico Tiepolo - The Procession of the Trojan Horse in Troy, c.1760 
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Malware exploits vulnerabilities in our systems, be they human error or software bugs. People may 
switch off their firewalls or fail to renew virus protection which will create obvious weaknesses in their 
systems. Administrative rights can also fail to prevent access to certain file areas which may otherwise be 
breached by viral threats. Otherwise cracks in software where data is passed from one function, module 
or application to another, (which is often deemed to have been checked and trusted somehow by the 
source) may open opportunities for attackers. 


System vulnerabilities 


People are often the weakest point in security. Passwords are no guarantee of protection against 
unauthorised access since these are sometimes written down, guessed or dishonestly 'blagged' using 
social engineering techniques to persuade the password holder to divulge their 

authentication credentials. 


Protection against threats 


Code quality is a primary vulnerability of systems. Many malware attacks exploit a phenomenon called 
‘puffer overflow’ which occurs where a program accidentally writes values to memory locations too 
small to handle them, and inadvertently overwrites the values in neighbouring locations that it is not 
supposed to have access to. As a result of a buffer overflow attack, overflow data is often interpreted 
as instructions. The virus could be written to take advantage of this by forcing the program to write 
something to memory which may consequently alter its behaviour in a way that benefits the attacker. 


Social engineering, including phishing, is a confidence trick used to persuade individuals to open files, 
Internet links and emails containing malware. Spam filtering and education in the use of caution is the 
most effective method against this sort of vulnerability. 


Regular operating system and antivirus software updates will also help to reduce the risk of attack. 
Virus checkers usually scan for all other malware types and not just viruses, and since new variants are 
created all the time to exploit vulnerabilities in systems software, it is vital that your system has the latest 
protection. In the worst cases, a lack of monitoring and protection within a company can make 

national headlines. 


Exercises 


1. Alarge company network uses a firewall as part of its security. 
(a) What is meant by a ‘firewall’? [2] 
(bo) The company also uses anti-virus software as protection against worms, viruses and Trojans. 
(i) Give one reason why the anti-virus software should be kept up-to-date. [1] 
(ii) State the difference between worms, viruses and Trojans. [3] 
2. Malicious attacks on systems are frequently identified and blocked by various systems. 
(a) How might a proxy server reduce the risks of malware attacks on a network? [1] 


(c) Explain how the use of a proxy server may make access to websites faster for users. [2] rN 
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Chapter 24 — HTML and CSS 


Objectives 


« Understand HTML and the role of HTML on the World Wide Web 
e Understand CSS and the role of CSS in web pages 
¢ Be familiar with various HTML and CSS tags and their functions 


¢ Use inline CSS directly within HTML files using the <style> tag, and with external style sheets 


HyperText Markup Language (HTML) 


HTML is the language or script that web pages are written in. It describes the content and structure of a 
web page so that a browser is able to interpret and render the page for the viewer. HTML is usually used 
in conjunction with Cascade Style Sheets (CSS) which dictate the style and formatting of a web page 
rather than its content. 


HTML and CSS 
The effects of HTML and CSS on a webpage can be seen left, without CSS styles, and with styles 
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HTML only, without CSS HTML with CSS formatting 


HTML Tags 


HTML is made up of tags written in angle brackets, often in opening and closing pairs, 
@.g. <Atm1> and </html>. 


A standard web page comprises two sections — a head and a body. The head contains the title of the 
webpage that may appear in a window header or browser tab, and any script that may enrich your page 
content. The body contains the main content of the page, defining text, images and hyperlinks, An HTML 
file can be created using a text editor such as Notepad, or using software such as Adobe Dreamweaver. 
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The <head> ; ' The <body> 
section ; section 
contains the 5 <title> Page Title </title> contains the 
page title and </head> main HTML 
any scripts or <body> page content. 


styles. tipi 
</htel> 


A table of common tags and their function is given below: 


7 


HTML Tag | Definition 


<p> A paragraph separated with a line space above and below 


Self closing image tag with parameters: 

<img sre = location, height=x, width=y> 
Anchor tag defining a hyperlink with location parameter: 
<a href="location"> Link text </a> 


Defines an ordered (numbered) or unordered (bulleted) list 
Defines an individual list item within either a numbered or bulleted list 


The HTML <div> tag 


The <div> tag facilitates the division of a page into separate areas, each of which may be referred to 
uniquely by name, and styled differently using CSS. 


CSS Script 


CSS is a scripting language similar to HTML that is used to describe the layout and styles of a web page. 
Styles can be applied to existing HTML elements such as <h1>, <p> or <div>. 


Embedded, inline and external CSS 


CSS script can be inserted directly into the HTML document <head> as internal or embedded CSS 
between its own <style></style> tags. It can also be entered directly within the HTML body, 

known as inline CSS, as shown in lines 15 and 19 of the example HTML script overleaf. Either of these 
methods enable styles to be kept within the HTML document, and inline CSS can help make one-off style 
adjustments that are unlikely to affect any other part of the website. By far the most common technique, 
however, is to make style declarations in an external style sheet. A link to the external sheet can be 
placed in the HTML file using the <link> tag, for example see line 4 of the HTML script on the following 
page. Linking to an external style sheet has the advantage that multiple HTML or webpage files within the 
same site can each link to the same style sheet so that formatting can be applied consistently without the 
need to duplicate CSS styles. 


<img> 


<a> 
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Identifiers and classes 
Identifier and class selectors are named ‘hooks' onto which you can hang styles. You can then apply 
these grouped styles to an HTML element such as a <div> element by adding the class or id name as a 
parameter, e.g. 


<div id="page">. 
The styles for the id selector called page are listed within curly brackets within the CSS document: 


#page{max-width:800px; margin: 20px auto; padding: 30px; 
background-color: #cc6633; } 

(Refer to line 8 of the HTML script on the next page, and lines 13-19 of the CSS script overleaf.) Any 

HTML content within the page divider will be styled accordingly. 


Identifiers 


Identifiers are defined with a hash tag (#) preceding the id name, e.g. #header (CSS lines 21-26). 
Identifiers must be unique to every webpage. In this ‘Germ theory’ example, #header is a good example 
of a unique element since a webpage is likely only to contain one header. 


Classes 


Classes work in a similar way to an identifier but use a full stop as a prefix to the class name e.g. .list 
(CSS Script lines 35-38). Classes can be used multiple times on a webpage. In the example within this 
chapter, there are two lists which share common formatting unique to the list element such as the font 
colour. This can be defined in the CSS and applied to all list <div> regions on the page. See HTML 
Script lines 22 and 32. 


<div 
id="header"> 


<div 
id="page"> 


img {border}: 


double 10px 
white; } 


<div id="left- 


column"> 
<div id= 


"right- 
column"> 


<div 
eclass="list"> 
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HTML Script 
1 <html> 
2 <head> 
3 <title>Germ Theory</titLle> 
4 <Link href="styles.css" rel="stylesheet" type="text/css"> 
5 </head> 
6 
7 <body> 
] <div id="page"><!--Opening page--> 
q <div id="header"> 
10 <hl>Germ Theory</hl> 
1l </div> 
12 <h2>Vaccination and pasteurisation</h2> 
13 <h3>Magic bullets of the 19th-20th Century</h3> 
14 <img src="germ_theory.jpg" width="750" height="300" alt="The atmospheric 
germ theory"> 
15 <p style="margin: 10px Opx;"> 
16 <a href="https: //archive.org/details/b21486308">The atmospheric germ 
theory, 1868 Edinburgh Medical Journal</a> 
17 </p> 
18 
19 <div id="Left-column" style="float: left; text-align: left; 
20 width: 300px; "> 
21 <h3>Key figures</h3> 
22 <div class="List"> 
23 <ul> 
24 <li>Louis Pasteur (1822-1895)</1i> 
25 <Li>Robert Koch (1843-1916)</1li> 
26 <Li>Paul Erlich (1853-1915)</li> 
aT <ful> 
2a </div> 
29 </div><!--Closing left-column div--> 
30 <div id="right-column"> 
31 <h3>TimeLine</h3> 
32 <div class="List"> 
33 <ol> 
34 <Li>Louis Pasteur developed 'Germ theory' which led to the 
‘pasteurisation' of Liquids</li> 
35 <lLi>Koch developed Pasteur's Germ theory to create the first 
‘magic bullets' or vaccines to attack specific bacteria</1i> 
36 <Li>Erlich then developed a technique for chemically poisoning 
specific bacteria know as 'chemotherapy'</1i> 
37 </ol> 
3 </div> 
39 </div><!--Closing right-column div--> 
40 <div style="clear:both;"></div> 
41 </div> <!--Closing page div--> 
42 </body> 
43 </htmb> 
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CHAPTER 24 — HTML AND CSS 


Exercises 
1. A website has the following HTML code. 


<html> 
<head> 
<title>Garden Roses</title> 
</head> 


<body> 
<hl style="font-family:Arial; color:red">Species</hl1> 
<p>There are over 100 species of rose,</p> 


<!—Part b --> 


<ul> 
<li>Climbing roses</li> 
<li>Shrub roses</1li> 
<li>Rambling roses</li> 
</ul> 
</body> 
</html> 


(a) Sketch and annotate the website as it would appear in a browser. [4] 


(o) The site owner would like to add a hyperlinked image rose.jpg in place of the comment 
<!-—- Part b -->. The image would link to the website http://www.roses.com. 
Write the code to enable this. [3] 


(c) Heading 1 <h1> has had some styles applied using inline CSS. 


(i) Give one advantage of using CSS styles within the HTML document. [1] 
(ii) Give two advantages of using an external CSS style sheet. [2] 
(dq) An external CSS style sheet is added to the web page. This contains three rules. Describe 


what effect if any these rules will have on the appearance of the web page. Where there is no 
effect, this should be stated. 


(i) body {background-color: lightGreen} [7] 
(i) p.bold {font-wei¢ht: bold} [1] 
() hl {text-align: center} [1] 


— 


(e) The text within the <ul> tags needs to be styled in green with the intention that any other lists 


added to the page share the same style. Explain how this can be achieved. [3] 
2, Cascade Style Sheets (CSS) make use of .classes and #identifiers. 
(a) Explain the difference between them giving an example of when each might be used. [2] 


(ob) Explain how a CSS style defined as a class or identifier may be applied to a specific section of 
HTML content. [2] 
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Chapter 25 — Web forms and JavaScript 


Objectives 
¢ Be able to add HTML form controls to a web page 
e Explain the role of JavaScript inside web pages 
e Understand and follow JavaScript syntax 
¢ Write basic JavaScript code for a given scenario 
e Use JavaScript to change the content of HTML elements 


¢ Create output, including alert boxes, using JavaScript 


Web forms 


Web forms enable websites to collect user input data and selections. Input types include textboxes, 
check boxes and radio buttons, for example. 


EU Cheap tan Token UKP 


CB Pertrecern tte (omy https / www.thetrainiine.com 7 & 


OAIT 


ide bee aa ch 


Inout can be validated and submitted to the website owner's database or processed as part of a search 
query to find, for example, train times or your nearest shop branch when you enter a postcode. 


Creating a web form using basic HTML form controls 
A simple, unformatted web form that uses basic text boxes for input and a pair of buttons to submit and 
reset the page can be created very quickly. It will remain functionless however until actions are applied to 
it. JavaScript can be used to add behaviours to a web page, and included in that, active web forms. 


CHAPTER 25 — WEB FORMS AND JAVASCRIPT 


The HTML script below should be compared with the screenshot of the page below. 


<hl>Register</hl> 

<form action="process.php" method="post"> 
<label>Enter your email to register:</label> 
<input type="text" id="email" value="""size="40" /> 
<button type="submit">Submit</button> 
<button type="reset">Reset</button> 


</form> 
go - i = 
1 epee « 
¢ 4] = 
Register 
Enter your email to register: 
Submit Reset 


Form handling with submit and reset actions 


The button type is specified as an attribute of the button, e.g. type="reset". This will provide some 
basic functionality in the case of the reset button which clears the form data. A submit button type will 
send data to a form handler specified in the action attribute of the <form> tag. The form handler on 
the server will then process the form data — in this case, an email address. 


JavaScript 


JavaScript is a script language that uses all of the same programming constructs that are familiar in 
languages such as Python and VB. It should not be confused with the language Java. JavaScript is 
commonly used to add interactivity to websites, including the manipulation of page objects, animations, 
navigation tools and form validation. JavaScript is interpreted rather than compiled. Compilers produce 
object code which is specific to a particular type of processor. JavaScript needs to be translated 

into the object code for whichever computer the browser is running on, and will be translated by the 
interpreter when the web page is displayed. An interpreter in the browser reads the JavaScript code, 
interprets each line and runs it. Some of the latest browsers however, use ‘Just-ln-Time' compilation 
which compiles JavaScript into executable bytecode just before execution. 


Input 


JavaScript can be used to process input data on the client’s computer. This may change the local page 
interactively or post data to a server. The advantages of processing input data before it is posted to a 
server are that: 


« the local computer can validate erroneous data before submission to a database 


* a busy server is relieved from having to process everything itself. 
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Output 


JavaScript can reference and interact with HTML elements to edit, style or move them. For example, a 
validation script may change a ‘postcode’ input label to become red if a user has entered invalid data: 


decument.getElementByld ("postcode") .style.color="reda"; 


Using JavaScript to control webpage functions 
Building on the example of a basic web form above, JavaScript can be used to create a simulated 
Captcha form shown below. 
© tepuerwem cannes x =e. | 
2) *| = ZG 


Register with Captcha 


The HTML form elements are given ids in order for the JavaScript to reference them. (See lines 16-19 
of the HTML form script below.) Buttons are given onClick attributes in order to execute JavaScript 
functions when they are pressed. Their type has also been changed to become "button" rather than 
submit or reset actions. (See lines 20-21.) 


HTML form script 


s!IDOCTYPE html> 


2 <html> 

3 <head> 

4 <title>Register with Captcha</title> 

5 <style> 

6 body, Input { font-family: Arial, Helvetica, "sans-serif"; font-size: 15px; } 
7 </style> 

& </head> 

9 

19 <body> 

il 

12 <¢hl>Register with Captcha</hl> 

13 

14 <form 

15 <div id="captchaImage"> </div> <!-- Empty div to contain random Captcha image 
16 «label id="captchaPrompt">Enter the word shown above:</label><br /> 

17 <input type="text" id="captchaResponse" value="" size="49" /><br /><br /> 

18 «label id="emailPrompt">Enter your email to register:</label><br /> 

19 <input type="text" id="email" value="" size="46" /><br /><br /> 

20 sbutton type="button” onClick="validateForm() ;">Submit< /button> 


«button type="button" onClick="setupForm();">Reset< /button= 
<input type="hidden" id="captchaAnswer" value="" /> 
</form> 
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JavaScript code 
JavaScript functions and commands are added to HTML documents within <script> tags. 


25 <script type="text/javascript"> 


26 /j/ needs to run when page Loads or refreshes 

27 ~=6function setupForm() { 

28 document. getE LementById("captchaPrompt").innerHTML="Enter the word shown above:'; 

29 document. getE LementById("captchaPrompt") .style.color="black"; 

38 document. getE LementById("captchaResponse").value=""5 

31 document. getE LementById("emailPrompt").innerHTML='Enter your email to register:'; 

2 document. getE LementById("emai lPrompt").style.color="bLack"; 

33 var captcha=["captchal.jpg","captcha2.jpg","captcha3.jpg"]; 

34 var captchaAnswer=["weaseL", "moose" ,"poLecat"]; 

35 var j=Math.ceil(Math.random() * captcha. Length); 

36 g77% // Javascript indexes start at @ - the count is 3 items, so subtract 1 to get item in 
range 0-2 

37 document. getELementById("captchaImage").innerHTML="<img src='"+captcha[j]+"' width='"365' 
height="180' />"5 

38 document. getE LementById("captchaAnswer").value=captchaAnswer [7]; 

39 «} 

46 

41 function validateForm() { 

42 // validates the captcha 

43 if (document. getELementById("captchaResponse").value != document. getElementById( 
"captchaAnswer").value) { 

44 document. getELementById("captchaPrompt").style.color="red"; 

45 } else { 

ag // validates the email for an @ character within the string 

a7 var valid=false; 

48 var email=document.getELementById("email"). value; 

4g //var emai lLength=email. Length; 

58 if(email.indexof("@") >= 1) { 

51 valid=true; 

52 } 

53 if(valid==true) { 

54 alert('Thank you for registering with address: \n' + email); 

55 document. getE LementById("emailPrompt").style.color="black"; 

56 } else { 

57 document. getE LementById ("emai lPrompt").innerHTML='Enter a valid email to register:'; 

58 document. getE LementById("emailPrompt").style.color="red"5 

59 } 

68 

61 } 

62 } 


63  setupForm(); 
64 </script> 


65 
66 </body> 
67 </html> 
JavaScript output 
JavaScript commands can access and edit HTML elements outside of the <script> tags, and write 


directly to the web page document using the command document.write ("Hello World"); for 
example. The attribute .innerKTML of an HTML element can be edited directly. (See line 28 above.) 
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Another method is to cause the browser to display a pop-up alert box with a custom message requiring 
the user’s attention. Line 54 displays an alert box once the user has submitted valid details. 


JavaScript Alert box 


Functions and variables 


JavaScript functions are declared within curly brackets {} and called using the function name e.g. 


setupForm()};. Function parameters may be included inside the round brackets, but if there are none, 
empty brackets must be used, 


Validation 


Validation routines are commonly built in to webpages using JavaScript since the script is executed 
locally on the client's machine. The function validateForm() checks the user input and either 
changes form labels and styling in response to an invalid entry, or displays the alert box above. 


Register sth came xp ; = = oe 
é) *i = 2woO- 


Register with Captcha 
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Arrays in JavaScript 
JavaScript arrays can hold any type of data. In this example there are two arrays — one to hold a set of 
three captcha images and the other to hold the answers to each of them. 
var captcha=["captchal.jpg","captchaZ.jpg","captcha3.jpg"]; 
var captchaAnswer=(["weasel", "moose", "polecat"]); 


On line 35, a math function generates an average number between 1 and the length of the array (i.e. 3 
in this case), and assigns it to variable 4. JavaScript array indexes begin at 0, so 1 is subtracted from 4 
using the simplified command 43-- to decrement j by 1 in order to reference array indices 0-2. 


Exercises 
1. A website contains Javascript code. 
(a) Describe what is meant by the term JavaScript. [2] 
(bo) Explain why JavaScript is usually interpreted rather than compiled. [2] 


2. The website www.postrates.com offers a rate check service for sending letters and parcels. 
The homepage contains a button hyperlinked to the following webpage: 


1 <!doctype html> 

2 <html> 

3 <head> 

4 <title>Shipping rates</title> 

5 </head> 

6 

7 <body> 

& Calculate shipping rates 

9 <script Language = "JavaScript"> 

18 var weight = prompt("Enter the parcel weight in kg", ""); 
11 var Length = prompt("Enter the largest dimension in cm", ""); 
12 

13 4f (weight < 1 && length <= 20) 

14 { 

15 alert("Letter rate: £6.65"); 

16 } 

17 

18 4f (weight < 1 && length > 26) 

i9 { 

20 alert("Small parcel rate: £1.85"); 
ay 6} 

3 Af (weight >= 1) 

24 { 

25 alert("Large parcel rate: £3.50"); 
26 4} 
27 «6<f/script> 

28 

2 </body> 
36 =< fhtml> 

(a) Which lines of code contain JavaScript code? [1] 
(6) Give the identifiers of two variables used in the code. [2] 


(c) Looking at the webpage code, what is the purpose of the JavaScript function called 
prompt on line 10? [2] 


(dq) When the webpage is requested, what would happen if a parcel weight of 1kg and a 
maximum length of 10cm is entered? [2] 
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Chapter 26 — Search engine indexing 


Objectives 

A] e Understand how web pages are indexed by search engines 

@* Understand the PageRank algorithm 

© ¢ Be able to interpret and apply the PageRank algorithm to a given scenario 


Search engines 


Search engines such as Google are systems that locate resources on the Internet. These resources could 
be web pages, documents, images or other files. 


Search engine indexing 
Search engines rely on a database or index of web pages to find the pages you are looking for. To build 
this index, a software program called a web crawler or spider is used. This constantly goes out to all the 
pages currently on the index, and then on to fetch all those sites linked to by those sites and so on until 
they have linked to all or nearly all web pages and resources on the Internet. Different search engines use 
their own crawler programs so a search in one engine might return different results from another. 


Key words, meta tags, and descriptions 


Search engines look for key words and phrases within web pages or resource content that match your 
search terms. These are visible to the user and part of the main web page content. 


Tolpuddie Martyrs: Welcome 
weew ul © 


Teits the tale of six labourers’ arres!, trial and deportation for unionésing. \eading to the foundation of 
modem trade unionism. 


Meta tags and descriptions are a list of keywords or concise phrases specified by the website owner 
that are built into each webpage. Descriptions are displayed with the page title in search results as 
shown above. These can be defined in the HTML documents within the <head> section to help searches. 


1 id type htel 

2\ <html> 

3 <head> 

4| <meta http-equiv="Content-Type” content="text/html; charset=utf-8"> 

5 <TITLE>Tolpuddle Martyrs</TITLE> 

6 <META NAME="Keywords” CONTENT="martyr, tolpuddle, farm, worker, 
labourer, dorset, loveless, 1834, union, liberty, australia”> 

7) <META NAME={"Description” CONTENT="In the 1836s life in rural 
villages like Tolpuddle was hard and getting worse. Farm workers 
could not bear yet more cuts to their pay. Some fought back against 
land owners and formed the first trade unions.”> 

& <head> 

®| <body> 

10) </body> 

| </html> 


142 


CHAPTER 26 — SEARCH ENGINE INDEXING 


A-Level only 
Search results 


There are believed to be over 200 factors affecting search results that may help position your own 
website nearer the top of the results list. Other than metatags and descriptions, these include: 


* using keywords in the <title> tag 

e the age of your website and date of last update (or frequency of updates) 
*« the number and relevancy of keywords appearing in <h1> tags and 

« the relevancy of the domain name to the content 


Google’s PageRank algorithm 


In the 1990s two postgraduate Computer Science students called Larry Page and Sergey Brin met at 
Stanford University. Brin was working on data mining systems and Page was working on a system to 
rank the importance of a research paper according to how often it was cited in other papers. 


The pair realised that this concept could be used to build a far superior search engine to the existing 
ones, and they started to work on a new Search Engine for the Web. The problem they set themselves 
was how to rank the thousands or even millions of web pages that had a reference to the search term 
typed in by a user. To make a search engine useful, the most reliable and relevant pages need to appear 
first in the list of links. 


Until that point, pages had generally been ranked simply by the number of times the search term or 

its synonyms appeared on the page. Page's and Brin's insight was to realise that the usefulness and 
therefore the rank of a given page, say Page X, can be determined by how many visits to Page X result 
from other web pages containing links to the page. Taking this further, links from a Page Y that itself has 
a high rank are more significant than those from pages which have themselves only had a few visits. 
The importance or authority of a page is also taken into account so that a link from a .gov page ora 
page belonging to the BBC site, for example, may be given a higher PageRank rating. 


An initial version of Google was launched in August 1996 from Stanford University’s website. By mid- 
1998 they had 10,000 searches a day, and realised the potential of their invention, 


They represented the Web as a directed graph of pages, using an algorithm to calculate the PageRank 
(named after Larry Page) of each page. Every web page is a node and any hyperlinks on the page are 
edges, with the edge weightings dependent on the PageRank algorithm. 


Using PageRank, B has a higher page rank than C because it is a more authoritative source. 


By 2015, Google was processing 40,000 search queries every second, worldwide. David Vise, the author 
of The Google Story noted that “Not since Gutenberg” ... has any new invention empowered individuals, 
and transformed access to information, as profoundly as Google." 


(* Gutenberg invented the printing press in the fifteenth century) 143 


SECTION 5 — NETWORKS AND WEB TECHNOLOGIES 


A-Level only 


Calculating PageRank 


PageRank is effectively a popularity contest between websites defined by the number of votes or inbound 
links they receive, with a weighting to give more importance to some votes than others. This weighting 

is swayed by either the number of outbound links a site has or the importance (or PageRank) of a site. 

A website with a good reputation and high PageRank will have a higher weighting assigned to its ‘votes’ 
but its total vote is shared or diluted amongst all of the sites it links to. 


The PageRank algorithm itself is defined as: 
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn) /C(Tn) ) 


where: 


* PR(A) is the PageRank of page A 


¢ C(Tn) is the total count of outbound links from web page n including the inbound link to page A. 
All webpages have a notional vote of 1. This is shared between all those it links to. 


¢ PR(Tn) /C(Tn) is the share of the vote that page A gets from pages T1 ... Tn. Each of these vote 
fractions is added together and multiplied by d. 

¢ dis the damping factor set to prevent PR(Tn) /C (Tn) from having too much influence. It is 
notionally set to 0.85, which in probability terms says that after roughly six click-through links, the 
average user will either stop their session or enter a new web address in their browser directly rather 
than following another link. 


The PageRank of a page is constantly being recalculated and updated. 


: Applying the algorithm 

; The PageRank of one web page is determined in part by the PageRank of other pages that link to it. 
However, the PageRank algorithm works without the need to know any of the other PageRanks of back- 
linked pages. (A back-link can be defined as an inbound link from another site.) Instead, a guess can be 
made in the first instance and after several iterations of the algorithm, the PageRank begins to home in 
on the correct figure. It can take dozens, if not hundreds or even millions of iterations before this number 
finally stops moving. Once settled, the average PageRank of all pages will be 1. 


Example 1 


In this simplest of examples with a hypothetical world wide web consisting of just two web pages, 
pages A and B would have equal ranking if there is one inbound and one outbound link between them. 


A > 


This can be calculated using the PageRank algorithm to give an equal ranking of 1: 
d=0.85 
PR(A) = (1 - d) + d(PR(B)/1) PR(A) = 0.15 +0.85*1=1 
PR(B) = (1 — d) + d(PR(A)/1) PR(B) = 0.15 + 0.85 * 1 =1 
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Example 2 shows the iterative process used to calculate and recalculate the PageRank (PR) ofa group of : 
webpages where the starting point is unknown. ; 


Example 2 


As the number of web pages grows, more complex link structures are created. After the addition of 
one extra web page, the PageRank is recalculated and adjusted to reflect the new pages and links. 


First iteration: (Assumes a PR of 7 for each page where not known.) 


d = 0.65 
PR(A) = (1 -d) + d(PR(B)/2 + PR(C)/1) PR(A) =0.15 + 0.85" 05h) = 1.425 
PR(B) = (1 -d) + d(PR(A)/1) PR(B) =0.15 + 0.85 * 1.425 = 1.361 
PR(C) =(1-d) + d(PR(B)/2 + PR(D)/1) PR(C) =0.15 + 0.85 * (0.681 + 1) = 1.578 
PR(D) = (1 -d) + d(0) PR(D) =0.15 
Second iteration: (Uses new PR figures from first iteration.) 
d =0.85 
PR(A) = (1 -d) + d(PR(B)/2 + PR(C)/1) PR(A) =0.15 + 0.85 * (0.681 + 1.578) = 2.07 
PR(B) = (1 -d) + d(PR(A)/1) PR(B) =0.15 + 0.85 * 2.07 = 1.909 
PR(C) =(1-d) + d(PR(B)/2 + PR(D)/1) PR(C) =0.15 + 0.85 * (0.955 + 0.15) = 1.089 
PR(D) = (1 -d) + d(0) PR(D) =0.15 
Third iteration: 
d =0.85 
PR(A) = (1 -d) + d(PR(B)/2 + PR(C)/1) PRIA) =0.15 + 0.85 * (0.955 + 1.089) = 1.887 
PR(B) = (1 —-d) + d(PR(A)/1) PR(B) = 0.15 + 0.85 * 1.887 = 1.754 
PR(C) = (1 -d) + d(PR(B)/2 + PR(D)/1) PR(C) =0.15 + 0.85 * (0.877 + 0.15) = 1.023 
PR(D) = (1 -d) + d(0) PR(D) =0.15 


After three iterations, the PageRank of each page begins to settle. In reality many more iterations 
would be necessary before the figures stop moving, but three iterations get us close enough to 
understand the process and begin to see some results. 


Page A now has a slightly higher ranking than B since it has another vote from page C. Page B has 
a higher rank than pages C and D because it has 100% of the votes from A, a high ranking page in 
itself. Page C has a comparatively moderate ranking since it has two inbound links from other pages 
that also have inbound links. C’s vote from page D however is not given significant importance since 
page D has no inbound links and therefore has a low PageRank. 
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1. The owner of website www.inflatablecastle.com is trying to improve the positioning of his homepage 
inflatablecastle.com/index.html in search engine listings. 


(a) Other than PageRank, give three design factors that may affect the company homepage's 
positioning in search results. [3] 


Google's PageRank algorithm PR/A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) calculates a 
ranking for each web page that has a significant bearing on search results. 


(b) With reference to the diagram below, explain which page is likely to have the highest 
PageRank. You are not expected to perform any calculations. [2] 


inflatablecastle.com 


(c) Looking at the algorithm, what factors directly influence the PageRank of the homepage 
index.htm at inflatablecastles.com? [2] 


(d} PageRank uses a damping factor d in its algorithm. Explain the purpose of a. [2] 


2, Search engines provide a listing of all web pages with content relevant to a set of search terms. 


(a) Explain how search engines produce this list. [2] 
(b) With reference to the screenshot below, state which line of code contains metatags. [1] 
(c) Briefly explain the purpose of the meta description. [2] 
[eehtm> | 
2 <head> 


3 <meta http-equiv="Content-Type” content="text/html; charset=utf-8"> 

4 <TITLE>Fossils</TITLE> 

5 <META NAME="Keywords” CONTENT="dinosaur, lyme, regis, limestone, 
ammonite, bone, jurassic, strata, rock, geology, paleontology™> 

6 <META NAME="Description” CONTENT="Fossils are the preserved remains 
of animals or plants, commonly found embedded in sedimentary 
layers of rock."> 

7| <head> 

8) <body> 

@ </body> 
10 </html> 

——— I Sm 
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Chapter 27 — Client server and peer-to-peer 


Objectives 


A] 
Q- 


? 


Understand the client-server and peer-to-peer models 
Describe situations where each model may be used 
Explain the difference between client- and server-side processing and the advantages of each 


Identify the different uses of client- and server-side processing and describe situations when one or 
the other may be more practical 


Identify the advantages and disadvantages of client- and server-side processing 


Client-server networking 


In a client-server network, one or more computers known as clients are connected to a powerful 

central computer known as the server. Each client may hold some of its own files and resources such as 
software, and can also access resources held by the server. In a large network, there may be several 
servers, each performing a different task. 


Computer 


Printer 


Print 
Server 


File server holds and manages data for all the clients 
Print server manages print requests 

Web server manages requests to access the Web 
Mail server manages the email system 

Database server manages database applications 


In a client-server network, the client makes a request to the server which then processes the request. 


Advantages of a client-server network 


Security is better, since all files are stored in a central location and access rights are managed by 
the server 


Backups are done centrally so there is no need for individual users to back up their data. If there is a 
breakdown and some data is lost, recovery procedures will enable it to be restored 


Data and other resources can be shared 
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Disadvantages of a client-server network 
« tis expensive to install and manage 


e Professional IT staff are needed to maintain the servers and run the network 


Cloud computing 


Cloud computing refers to a growing service-based industry providing access to software or files via the 
Internet using the client-server model. File storage companies such as DropBox, OneDrive or Google 
Drive offer file storage facilities where users’ files are kept on remote servers. Other companies offer 
software via the cloud, a provision known as Software as a Service (SaaS). Microsoft, for example offers 
cloud-based Office applications. Accounting packages are also available through website logins where all 
the company data and application are stored offsite. 


Peer-to-peer networks 


In a peer-to-peer network, there is no central server. Individual computers are connected to each other, 
either locally or over a wide area network so that they can share files. In a small local area network, such 
as ina home or small office, a peer-to-peer network is a good choice because: 


e itis cheap to set up 
e it enables users to share resources such as a printer or router 


e itis not difficult to maintain 


Peer-to-peer networks are also used by companies providing, for example, video on demand. A problem 
arises when thousands of people simultaneously want to download the latest episode of a particular TV 
show. Using a peer-to-peer network, hundreds of computers can be used to hold parts of the video and 
so share the load. This is the main principle behind dozens of torrent websites that enable the sharing of 
files, often containing copyright material. 
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The downside of peer-to-peer networking 


Peer-to-peer networking has been widely used for online piracy, since it is impossible to trace the files 
which are being illegally downloaded. In 2011, the US Chamber of Commerce estimated that piracy sites 
attracted 53 billion visits each year. The analyst firm NetNames estimated that in January 2013 alone, 
432 million unique Web users actively searched for content that infringes copyright. 


Case study: Piracy sites 


In January 1999, 19-year-old Shawn Fanning and Sean Parker created the Napster software, which 
enabled the peer-to-peer “sharing” of music — in actual fact, the theft of copyright music. Instead of 
storing the MP3 files on a central computer, the songs are stored on users’ machines. When you want 
to download a song using Napster, you are downloading it from another person's machine, which may 
be next door or on the other side of the world. 


All you need is a copy of the Napster utility and an Internet connection. Napster was sued for copyright 
infringement in 2000 but argued that they were not responsible for copyright infringement on other 
people's machines. However, they lost the case and were pushed into bankruptcy, but the service has 
since reinvented itself on a legitimate, subscription basis. 


File transfer 


Napster 
client 


request 


load Napster 
INTERNET Central 
—«_ Index 
server = 
Napster Your 
client computer 


The consequences of piracy 


In 2014, Popcorn Time was launched, allowing a decentralised peer-to-peer service for illegal streaming 
of movies. Popcorn Time has already been translated into 32 languages and has been described as a 
“nightmare scenario” for the movie industry. The more movies that are stolen and illegally downloaded 
online, the fewer resources moviemakers have to invest in new films. In 2013 there was a 21% drop in 
the 18-24 age group buying tickets to watch movies, and numbers may plummet further in the next 
few years. 


A 2011 report by the London-based International Federation of the Phonographic Industry (IFPI) 
estimated that 1.2 million European jobs would be destroyed by 2015 in the music, movie, publishing and 
photography industries because of online piracy. 
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: Client- and server-side processing 
In the client-server model, data may be processed on either side; by either client, server or both. 


Web servers 


A client will send a request message to a server which should respond with the data requested or a 
suitable message otherwise. 


HTTP Client HTTP Server HTTP Client 
[| Request [zm | Request [| 
ree > 

6 e 


Response Response 


This is commonly seen when a client browser sends an HTTP request to a web server for dynamic web 
page data or a web resource, or when using a web page with an online search facility such as checking 
availability via a booking form. 


d= Skyscanner x 


London (LCY) P Ibiza (IBZ) 
August 2016 - August 2016 | 4 travellers | Economy 


oO 2 & 


London City (LCY) Ibiza (IBZ) 
@ Add ts @ Add nearby airports 


nearby airport 


i | Direct fiohts only 


aarcr 


The page data is sent back from the HTTP server by way of response and the browser renders the web 
page on the client's computer. 


Client-side processing 


Client-side processing describes situations where data is processed on the client computer, rather than 
on the server. This may happen because the client computer has specific software that can process the 
information, or to lighten the load on the server's processor. Processing data on the client-side can also 
improve security as it can avoid unnecessary data transfer. JavaScript is a client-side language and is 
frequently used to provide interactivity on a web page. Client-side processing can also adjust styles for 
different platforms or screen sizes. 
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JavaScript is commonly used for processing data on the client side to validate data entry before it is 
sent to the server. 


JavaScript validation 


<script> 
function validate() { 
var airport = document.forms["departure"] ["arrival"]; 
if (airport.value == "") { 
airport.style.borderColor = "red"; 
alert ("Departure and arrival airports cannot be left blank."); 
return false; 


} 
</script> 


Server-side processing 


Servers often process an enormous volume of data on behalf of multiple clients. They can also process 
the data much faster than a client computer. There are specific languages that are used for server-side 
processing such as SQL or PHP. Search requests (e.g. for a search engine or a company database) may 
be sent to the server where they may be applied to a database using SQL. Database search results are 
then sent back to the client browser. Validation may also be carried out on the server where an invalid 
entry must be compared with data already on a server database. Examples may include checking user 
credentials, or looking up valid airport locations. JavaScript may also be circumvented mailicously so 
server-side validation is important for the integrity of server data. 
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: API (Application Programming Interface) 


An API is a set of protocols (rules) that governs how two applications should interact with one another. 
An API sets out the format of requests and responses between a client and a server and enables 

one application to make use of the services of another. An organisation may use the Twitter API to 
enable relevant tweets to be regularly fed through to a display window within their own website. Price 
comparison websites may also use an API to gather data from individual company websites in order to 
display a list of each of them for the consumer. 


: Thin- versus thick-client computing 


The ‘thickness’ of a client computer refers to the level of processing and storage that it does compared 
with the server it is connected to. The more processing and storage that a server does, the ‘thinner’ the 
client becomes. If all the processing and storage is done by the server, then all that is required for the 
thinnest-client computer is a very basic machine with very little processor power and no storage. This 

is often Known as a dumb terminal. The decision to go ‘thick’ or ‘thin’ rather depends on your specific 
requirements and each option comes with its own advantages and disadvantages. 


Easy to set up, maintain and add terminals | Reliant on the server, so if the server goes 
to a network with little installation required | down, the terminals lose functionality. 
locally. 
| Requires a very powerful, and reliable 
Software and updates can be installed on | server which is expensive. 


the server and automatically distributed to | 


each client terminal. | Server demand and bandwidth increased. 


More secure since data is all kept centrally | Maintaining network connections for 


in one place. | portable devices consumes more battery 
| power than local data processing. 


Robust and reliable, providing greater 
up-time. 


Can operate without a continuous | Installation of software required on 
connection to the server. | each terminal separately and network 


administration time is increased. 
Generally better for running more powerful 


software applications. | Integrity issues with distributed data. 
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Exercises 


1. Explain the difference between client-server and peer-to-peer networking, and give an example of 
where each might be used. [6] 


2. A travel agency is planning to install a new computer system based on the client-server model, for 
its agents to use for flight and hotel bookings and enaquiries at multiple workstations. 


(a) What is meant by the client-server model? [2] 
After some consideration, the company has decided to use a thin-client network. 
(6) Explain how a thin-client network operates. [3] 


(c) How would the decision to use a thin- rather than thick-client network affect the choice 
of hardware? [2] 


A-Level only 
3. Acompany is designing a website which will allow its customers to place orders online. 


The individual web pages that describe each product will be generated dynamically using 
server-side scripting. 


Explain what a server-side script is. [2] 


4. A website is set up to enable users to access on-demand television programs. Users can sign 
up to the website and download a recent series episode or film. Programs are downloaded and 
stored on the user's device. When others choose to download the same program, parts of the 
program data may come from multiple devices belonging to other users. 


(a) State what this model of network is called. [1] 
(bo) (i) Give one advantage to the company of this model. [7] 
(ii) Give one advantage to the user of this model. [4] 


(c) JavaScript is used to validate that the user's email address is in a valid format when a 
booking is made. 


(i) Give two advantages of client-side validation. [2] 


(ii) Client- and server-side validation should happen in partnership. Explain why it is 
important to validate the email address again once it reaches the server. [7] 
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Section 6 


Data types 


In this section: 


Chapter 28 Primitive data types, binary and hexadecimal 
Chapter 29 ASCII and Unicode 

Chapter 30 Binary arithmetic 

Chapter 31 Floating point arithmetic 


Chapter 32 Bitwise manipulation and masks 


CHAPTER 28 — PRIMITIVE DATA TYPES, BINARY AND HEXADECIMAL 


Chapter 28 


Primitive data types, binary and hexadecimal 


Objectives 


* List and define primitive data types 
e Represent positive integers in binary and hexadecimal 


* Convert between binary, hexadecimal and denary 


Primitive data types 


A primitive data type is one which is provided by a programming language. They include: 


e integer a whole number such as -25, 0, 3, 28679 

e real/float a number with a fractional part such as -13.5, 0.0, 3.142, 100.0001 

*« Boolean a Boolean variable can only take the value TRUE or FALSE 

e character a letter, number or special character typically represented in ASCII, such as a, A, 4, 


? or %. Note that the character “4” is represented differently in the computer from the 
integer 4 or the real number 4.0 


* string anything enclosed in quote marks is a string, for example “Peter”, "123", or “This is a 
string”. Either single or double quotes are acceptable. 


All data types are held in the computer in binary, and this chapter describes how integers are 
represented. 


Number bases 


Our familiar decimal (or denary) number system uses the numbers O through 9 and therefore has a 
base of 10. Binary uses only the numbers 0 and 1 and has a base of 2. Hexadecimal uses a base 

of 16 with numbers 0-9 and letters A to F. A number’s base can be written as a subscript to denote 

its value in the correct number system. For example 1110 denotes the number eleven in denary. 112 would 
denote a binary value, (with a denary equivalent of three) and 11:. would denote a hexadecimal value. 

(17 in denary.) 


The binary number system 


In order to better understand the simplicity of the binary number system, it is a good idea to examine 
how our familiar denary number system works. Columns, right-to-left, represent units, tens and hundreds 
etc. We mentally multiply the values with their column value and add the totals together. 


1000s 100s 10s is 
5 0 T 4 
5000 + 70 + 4 = 5074 


155 


SECTION 6 — DATA TYPES 


The principle is exactly the same in the binary number system. As we move from left to right, each digit is 
worth twice as much as the previous one, instead of ten times as much. 


128 64 32 16 8 4 2 1 
1 1 0 0 1 0 1 1 
128 + 64 + 8 + 2 + 1 =203 


The minimum and maximum values that can be represented in n bits using unsigned binary are 0 and 
2”- 1 respectively. 


Converting from denary to binary 


To convert a denary number to binary, first write headings of 1, 2, 4, 8... 128 from right to left. (If the 
number is greater than 255, continue writing headings.) 


To convert a denary number, for example 73, into binary, write a 1 under the largest heading less than 73 
(i.e. 64). You now have 73 — 64 = 9 remaining, to be converted to binary. 9 = 8 + 1 so put 1 under 8 and 
under 1. Fill the spaces with zeros. The binary number representing 73 is 01001001. 


128 64 32 16 8 s 2 1 
0 1 0 0 1 0 0 1 =1001 


The hexadecimal number system 


The hexadecimal system, often referred to as simply ‘hex’, uses a base of 16 as follows: 


Denary Hexadecimal Binary 
0 0 0 
1 1 1 
2 2 10 
3 3 14 
4 4 100 
3] 5 101 
6 6 110 
7 7 114 
8 8 1000 
9 9 1001 

10 A 1010 
11 B 1011 
12 Cc 1100 
13 D 1101 
14 E 1110 
16 F 1114 
16 10 10000 
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Converting from binary to hexadecimal and vice versa 
To convert a binary number to hexadecimal, split the binary number into groups of 4 binary digits. 


Binary 0014 1010 3=1111 1010 
Hex 3 A F 9 
The hex representation of 0011 1010 1111 1010 is therefore 3AF9. 


To convert from hex to binary, perform this operation in reverse by grouping the bits in groups of 4 and 
translating each group into binary. For example, to convert the number 23), to binary: 


Hex 2 3 


Binary 0010 0011 = 00100011 


Converting from hexadecimal to denary and vice versa 


To convert from hexadecimal to denary, remember that the left column now represents 16s and not tens. 
For example, to convert 27.. to denary: 


16s 1s 
Hex 2 f= 2x%16+7=39 


To convert a denary number to hex, the easiest way is to first convert the denary number to binary and 
then translate from binary to hex. For example, to convert 7510 to hex: 


128 64 32 16 8 4 2 1 
Binary 0 1 0 0 1 0 1 1 
Hex 4 B 
Therefore 75;, = 4B,.(75/16 = 4 remainder 11, or 4B, since 11 is B in hexadecimal.) 


Why the hexadecimal number system is used 


The hexadecimal system is used as a shorthand for binary since it is simple to represent a byte in just 
two digits, and fewer mistakes are likely to be made in writing a hex number than a string of binary digits. 
It is easier for technicians and computer users to write or remember a hex number than a binary number. 
Colour codes in images often use hexadecimal to represent the RGB values, as they are much easier to 
remember than a 24-bit binary string. In the example overleaf #364DB2 represents 36, for Red, 4D;.¢ 

for Green and B2,, for Blue values, which can be displayed or printed in the Colour Picker window more 
compactly than in binary. 
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Exercises 


1. Aschool keeps data about each of its pupils. State the most suitable data type for each of the 
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[) Only Web Colors 


following data items: 


Pupil's surname 


A single letter indicating whether they are male or female 


The amount owed for school trips 


The number of school trips they have participated in 


Whether or not the pupil is entitled to free school meals 


2. Represent the denary number 123 in binary using 8 bits. 


3. How many different denary numbers can be represented using 8-bit binary? 


4, What is the hexadecimal equivalent of the denary number 123°? 


5. Why are bit patterns often displayed using hexadecimal instead of binary? 


6. Figure 1 shows the contents of a memory location. 


tfolsfojo}sts] | 


Figure 1 


Hex colour 
code 
364DB2 


What is the denary equivalent of the contents of this memory location if it represents an 
unsigned binary integer? 


7. What is the hexadecimal equivalent of the binary pattern shown in Figure 1? 


8. Convert the hexadecimal number DA to denary. 


Chapter 29 ASCII and Unicode 


Objectives 


* Define abit as a1 ora0, and a byte as a group of eight bits 

e Know that 2° different values can be represented with n bits 

« Use names, symbols and corresponding powers of 2 for binary prefixes e.g. Ki, Mi 

« Differentiate between the character code of a denary digit and its pure binary representation 


e Describe how character sets (ASCII and Unicode) are used to represent text 


Bits and bytes 


A bit is the fundamental unit of information in the form of either a single 1 or 0. 1 and 0 are used to 
represent the two electronic states: on and off, or more accurately a switch that is closed (to complete 
a circuit) or open (to break it). A byte is a set of eight bits, for example 0110 1101. One byte holds one 
character of text. 


The number of values that can be represented with rn bits is 2°. Two bits can represent 4 different values: 
00, 01, 10 and 11. Three bits can represent 8 values and four bits can represent 16 different values, 
since2x2x2x2= 16. 


Unit nomenclature 


Although we frequently refer to 1024 bytes as a kilobyte, it is in fact a kibibyte. To avoid any confusion 
between references to 1024 bytes rather than 1000 bytes, an international collaboration between 
standards organisations decided in 1996 that kibi would represent 1024, and kilo would represent 1000. 
Kibi is a combination of the words kilo and binary. The same is true of the other familiar names Mega, 
Giga and Tera being replaced by mebi, gibi and tebi. The table below outlines the nomenclature for 
increasing quantities of bytes, in which a KiB is a kibibyte and a MiB, a mebibyte. 


Kio | K 


Pie [|e | __—_—.000 tt 627.776 
Pees | P| 2 | __1,725.600.006.842608 | [Peta | P10 
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The ASCII code 


Historically, the standard code for representing the characters on the keyboard was ASCII (American 
Standard Code for Information Interchange). This uses seven bits which form 128 different bit 
combinations, more than enough to cover all of the characters on a standard English-language keyboard. 
The first 32 codes represent non-printing characters used for control such as backspace (code 8}, the 
Enter or Carriage Return key (code 13) and the Escape key (code 27). The Space character is also 
included as code 32 and Delete as code 127. 
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Character form of a denary digit 
Although numbers are represented within the code, the number character is not the same as the actual 
number value. The ASCII value 0110111 will print the character '7’, even though the same binary 
value equates to the denary number 55. Therefore ASCII cannot be used for arithmetic and would use 
unnecessary space to store numbers. Numbers for arithmetic are stored as pure binary numbers. 


‘7’ + ‘7" (i.e.0110111 +0110111 in ASCII) would be 77, not 14 or 110. 


The development of ASCII 


ASCII originally used only 7 bits, but an 8-bit version was developed to include an additional 128 
combinations to represent symbols such as a, © and f. You can try holding down the ALT key and 
typing in the code number using the number pad to type one of these symbols. For example, ALT+130 
will produce é, as used in café. The 7-bit ASCII code is compatible with the 8-bit code and simply adds a 
leading 0 to all binary codes. 


Unicode 
By the 1980s, several coding systems had been introduced all over the world that were all incompatible 
with one another. This created difficultly as multilingual data was being increasingly used and a new, 
unified format was sought. As a result, a new 16-bit code called Unicode (UTF-16) was introduced. 
This allowed for 65,536 different combinations and could therefore represent alphabets from dozens of 
languages including Latin, Greek, Arabic and Cyrillic alphabets. The first 128 codes were the same as 
ASCII so compatibility was retained. A further version of Unicode called UTF-32 was also developed to 
include just over a million characters, and this was more than enough to handle most of the characters 
from all languages, including Chinese and Japanese. 


This meant that whilst there is now just one globally recognised system to maintain, one character in this 
scheme uses four bytes instead of two, significantly increasing file sizes and data transmission times. 


Exercises 


1. The ASCII system uses 7 bits to represent a character. The ASCII code in denary for the numeric 
character '0' is 48; other numeric characters follow on from this in sequence. 


(a) Using 7 bits, what is the ASCII code for the character '2' in binary? [1] 
(6) How many different characters can be represented using ASCII? [1] 


2. One character encoding scheme is Unicode. An alternative character encoding scheme is ASCIl. 


(a) State one difference between Unicode and ASCIl. [1] 


(b) State one advantage and one disadvantage of using ASCII rather than Unicode for 
representing characters. [2] 


3. How many times greater is the storage capacity of a 1 terabyte hard disk drive than that of a 
256 megabyte hard disk drive? 


Show each stage of your working. [2] 
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Chapter 30 Binary arithmetic 


Objectives 


« Use sign and magnitude to represent negative numbers in binary 
e Use two's complement to represent negative numbers in binary 
« Add and subtract binary integers 


e Represent fractions in fixed point binary 


Binary addition 


Binary addition works in a similar way to denary addition. If two numbers added together are equal to or 
greater than the base value, (in the case of denary, 10) then the ‘tens’ are carried. In binary, an addition 
that equals 2 or more results in a carry over to the next column. 


In binary, the rules for addition are as follows: 


1. 0+0=0 
2. 0+1=1 
3. 1+0=1 
4, 141 =0 Carry 1 (This is 2 in denary or 10 in binary.) 


5. 14+41+1=1 Carry 1 (This is 3 in denary or 11 in binary.) 


Use the following worked example as a guide to where and how each of the rules is implemented. 


Overflow 


In the following example, 8 bits are used to store the result of an addition. The result of the addition is 
greater than 255, and an overflow error occurs where a carry from the most significant bit requires a 
ninth bit. 
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Representing negative numbers using sign and magnitude 


One way to represent negative numbers is to make the leftmost bit, called the most significant bit, 
a sign bit. 


* Ifthe most significant bit is zero, the number is positive 
« if the most significant bit is one, the number is negative. 


In essence we are coding a plus sign as 0 and a minus sign as 1. This is known as the sign and 
magnitude representation of binary numbers. For example, using one-byte numbers, 


00000011 = 3 


10000011 = -3 


Binary arithmetic using the sign and magnitude representation does not work as you would expect. A 
much better way of representing numbers in binary is called two’s complement. 


Representing negative numbers using two’s complement 


Two's complement binary works in a similar way to numbers on an analogue counter. Moving the wheel 
forwards one, will create a reading of 0001; turn back one, and the reading will become 9999. 9999 is 


interpreted as -1. 
nin 
‘ x ; 


Wilde KiKi 


11111101 = -3 
11111110 = “2 
11411111 = -1 
00000000 = 0 
00000001 = 1 
00000010 = 2 
0000001 1 = 3 


In binary: 


Calculating the range 
The range that can be represented with two's complement using rn bits is given by the formula: 


-(20-) 22. 20-9 - 1 


With eight bits, the maximum denary range that can be represented is -128 to 127 because the leftmost 
bit is used as a sign bit to indicate whether a number is negative. If the leftmost number is a 1, itis a 
negative number. Thus 10000000 represents -128. 
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Converting a negative denary number to binary 
Start by working out the positive equivalent of the number, flip all of the bits and add 1. For example, to 
convert the denary number -9 to binary: 


-9 
Positive binary : 00001001 
Flip the bits ' 11110110 
Add one ; 1 

11110111 


Converting a negative two’s complement binary number to denary 


The same method works the other way. Flip all of the bits and add 1. Then work out the result in denary 
using the normal method. For example, to convert the binary number 11100101 to denary: 


11100101 
Flip the bits : 00011010 
Add one } 1 
Convert i - 00011011 


Binary subtraction using two’s complement 


Binary subtraction is best done by using the negative two's complement number and then adding the 
second number. For example denary 17-14 would be: 


14 = 00001110 
14 = 11110010 
17 = 00010001 
17+ (-14) = (1) 00000011 


The carry on the addition is ignored, and the correct answer is given. 
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Fixed point binary numbers 


Fixed point binary numbers can be a useful way to represent fractions in binary. A binary point is used to 
separate the whole place values from the fractional part on the number line: 


8 4 2 15" h% Rw “ee 


0 1014 * 1 1 0 0 


In the binary example above, the left hand section before the point is equal to 5 (4+1) and the right hand 
section is equal to 42 + % (9%), or 0.5 + 0.25 = 0.75. So, using four bits after the point, 0101 1100 is 5.75 
in denary. A useful table with some denary fractions and their equivalents is given below: 


Binary fraction Denary faction 
ee 
Cs a 


0.00000001 1/256 0.00390625 


Converting a denary fraction to fixed point binary 
To convert the fractional part of a denary number to binary, you can employ the same technique as you 
would when converting any denary number to binary. Take the value and subtract each point value from 
the amount until you are left with 0. Take the example 3.5625 using 4 bits to the right of the binary point: 


Subtract 0.5: 0.5625-0.5=0.0625 1 
Subtract 0.25 from 0.0625: Won't go 0 
subtract 0.125 from 0.0625: Won't go 0 
Subtract 0.0625 from 0.0625: 0.0625 —- 0.0625 = 0 1 


3 = 0011 in binary. 0.5625 = 1001. So 3.5625 = 0011 1001 


It is worth noticing that this system is not only less accurate than the denary system, but some fractions 
cannot be represented at all. 0.2, 0.3 and 0.4, for example, will require an infinite number of bits to the 
right of the point. The number of fractional places would therefore be truncated and the number will not 
be accurately stored, causing rounding errors. In our denary system, two denary places can hold all 
values between .00 and .99. With the fixed point binary system, 2 digits after the point can only represent 
O, 44, %, or % and nothing in between. 
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1468 42 14 5" Lh % % 


The range of a fixed point binary number is also limited by the fractional part. For example, if you have 
only 8 bits to store a number to 2 binary places, you would need 2 digits after the point, leaving only 

6 bits before it. 6 bits only gives a range of 0-63. Moving the point one to the left to improve accuracy 
within the fractional part only serves to half the range to just O-31. Even with 32 bits used for each 
number, including 8 bits for the fractional part after the point, the maximum value that can be stored is 
only about 8 million. Another format called floating point binary can hold much larger numbers, with 
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greater accuracy. 


Floating point form is covered in the next chapter. 


Exercises 


1 


Represent the denary value -19 as an 8-bit two's complement binary integer. 


2, What is the largest positive denary value that can be represented using 8-bit two's 
complement binary? 

3. Describe how 8-bit two's complement binary can be used to subtract one number from another 
number. In your answer show how the calculation 25 — 49 would be completed using the method 
that you have described. 

4, Acomputer stores the current temperature of a supermarket delivery van. The temperature in °C 
is stored as a two’s complement integer using a single byte. 

(a) Convert the freezer temperature value of -19 into binary. 
(b} State the range of temperature values that can be stored using 8 bits. 
5, Amemory location contains the value 10101011. What is its denary equivalent if it represents 


a two's complement binary integer? 


CHAPTER 31 — FLOATING POINT ARITHMETIC 


A-Level only 


Chapter 31 — Floating point arithmetic 


Objectives 
« Represent positive and negative numbers with a fractional part in floating point form 
Normalise un-normalised floating point numbers with positive or negative mantissas 


e Add and subtract floating point numbers 


66606 


« Explain underflow and overflow and describe the circumstances in which they occur 


Fixed point binary numbers 


In the last chapter we looked briefly at how numbers with a fractional part can be held in fixed point 
format, which assumes a predetermined number of bits before and after the point. This makes fixed point 
numbers simpler to process but there is a compromise in the range and precision of values that can be 
represented in a given number of bits. Moving the point to the right increases the range but reduces the 
precision, or accuracy, of the fractional part and vice versa. 


1468 4 2 1 


es 
NS 
—- 


In the example above, only numbers which are multiples of 1/8 can be represented. The value 4.9, for 
example would be ‘rounded’ to 4.875 or 00100111 with three fractional bits to the right of the point. 


2048 1024 512 256 128 64 32 «16 8 4 2 1 ” 0.5 0.25 0.125 0.0625 


Figure 31.7 


Floating point binary numbers 


Using 32 bits (4 bytes), the largest fixed point number that can be represented with just one bit after the 
paint is only just over two billion, Floating point binary allows very large numbers to be represented. 
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A-Level only 
When ordinary denary numbers become very large, they are written in a more convenient scientific 


notation m x 10° where m is known as the mantissa or coefficient, and n is the exponent or order of 
magnitude. 5000 can therefore written as 0.5 x 10%, and 42,750,254 can be written as 0.42750254 x 105, 
moving the decimal point five places to the left. 


This technique can easily be applied to binary numbers too, where the mantissa and exponent are 
represented for example using 12 bits, with 8 bits for the mantissa and 4 bits for the exponent. 

The leftmost bit of both the mantissa and the exponent is a sign bit, with O indicating a positive number, 
and 1 a negative number. In a computer, of course, many more bits than this will be used to represent a 
floating point number, with 32-, 64- and 128-bit floating point numbers all being common. 


In all the examples below, eight bits are used for the mantissa and four bits for the exponent. The implied 
binary point is to the right of the sign bit. 


— Mantissa Exponent 


O ¢« 101 1 0 14 0 00 1 1 


0 * 1011010 0011 = 0.101101 x 2° = O101,101 = 4+1+0.5+0.125 = 5.625 


To convert the floating point binary number above to denary: 


e Write down the mantissa, 0.1011010 


e Translate the exponent from binary to denary 0011 = 3. This means that you have to move the point 
3 places to the right, as the mantissa has to be multiplied by 2°. 


e The binary number is therefore 101.1010 


e Translate this to binary using the table in Figure 31.1. The number is 5.625. 


Negative exponents 
If the exponent is negative, the decimal point must be moved left instead of right. 


0 e 1000000 1110 = 0.1 x 2? = 0,001 = 0.125 


The example above has a positive mantissa of 0.1000000 and a negative exponent of -2. 


e« Find the two's complement of the exponent. (Remember that to convert a positive to negative binary 
number using two’s complement you must flip the bits and add 1.) Exponent = -2 


¢ Move the binary point of the mantissa two places to the left, to make it smaller. The mantissa is 
therefore 0.001 (You can ignore the trailing zeros) 


« Translate this to denary with the help of Figure 31.1. The answer is 0.125. 
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Handling negative mantissas 


A negative floating point number will have a 1 as the sign bit or MSB (Most Significant Bit) of the mantissa 
indicating a negative place value. 


4 © 0101101 0101 = - 0.1010011 x 25 = -10100.11 = - 20.75 
Coase 


The example above has a negative mantissa of 1.0101101 and a positive exponent of 0101. 
e Find the twos complement of the mantissa. It is 0.1010011, so the bits represent -0.1010011 
¢ Translate the exponent to denary, 0101 =5 


« Move the binary point 5 places to the right to make it larger. The mantissa is -10100.11 


e Translate this to binary with the help of Figure 31.1. The answer is -20.75. 


Normalisation 


Normalisation is the process of moving the binary point of a floating point number to provide the 
maximum level of precision for a given number of bits. This is achieved by ensuring that the first digit after 
the binary point is a significant digit. To understand this, first consider an example in denary. 


In the denary system, a number such as 5,842,130,,can be represented with a 7-digit mantissa in many 
different ways 


0.584213 x 10’ = 5,842,130 
0.058421 x 108 = 5,842,100 
0.005842 x 10° = 5,842,000 


The first representation, with a significant (non-zero) digit after the decimal point, has the maximum 
precision. 


A number such as 0.00000584213 can be represented as 0.584213 x 10°. 
Normalising a positive binary number 
In binary arithmetic, the leading bit of both mantissa and exponent represent the sign bit. 
In normalised floating point form: 
A positive number has a sign bit of 0 and the next digit is always 1. 


This means that the mantissa of a positive number in normalised form always lies between 1 and 1. 
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xample 1 
Normalise the binary number 0.0001011 0101, held in an 8-bit mantissa and a 4-bit exponent. 


e The binary point needs to move 3 places to the right so that there is a 1 following the binary point. 

« Making the mantissa larger means we must compensate by making the exponent smaller, so subtract 
3 from the exponent, resulting in an exponent of 0010. 

¢ The normalised number is 0.1011000 0010 


Normalising a negative binary number 
An unnormalised number will have a sign bit of 1 and one or more 1s after the binary point. 


Example 2 
Normalise the binary number 1.1110111 0001, held in an 8-bit mantissa and a 4-bit exponent. 


¢ Move the binary point right 3 places, so that it is just before the first 0 digit. The mantissa is now 
1,0111000 


e Moving the binary point to the right makes the number larger, so we must make the exponent smaller 
to compensate. Subtract 3 from the exponent. The exponent is now 1-3 = -2 = 1110 


¢ The normalised number is 1.0111000 1110 
A normalised negative number has a sign bit of 1 and the next bit is always 0. 
The mantissa of a negative number in normalised form always lies between -¥2 and -1. 


Example 3 
What does the following binary number (with a 5-bit mantissa and a 3-bit exponent) represent in denary? 


This is the largest positive number that can be held using a 5-bit mantissa and a 3-bit exponent, and 
reoresents 0.1111 x 2° =7.5 


Example 4 
The most negative number that can be held in a 5-bit mantissa and 3-bit exponent is: 


This represents -1.0000 x 2° = - 1000.0 = -8 


Note that the size of the mantissa will determine the precision of the number, and the size of the 
exponent will determine the range of numbers that can be held. 
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Converting from denary to normalised binary floating point 


To convert a denary number to normalised binary floating point, first convert the number to fixed 
point binary. 


Example 5 
Convert the number 14.25 to normalised floating point binary, using an 8-bit mantissa and a 
4-bit exponent. 
e In fixed point binary, 14.25 = 01110.010 


« Remember that the first digit after the sign bit must be 1 in normalised form, so move the binary point 
4 places left and increase the exponent from 0 to 4. The number is equivalent to 0.1110010 x 2* 


e Using a 4-bit exponent, 14.25 = 0 1110010 0100 
Example 6 
lf the denary number is negative, calculate the two’s complement of the fixed point binary: 
e.g. Calculate the binary equivalent of -14.25 
14.25 = 01110.010 
-14.25 = 10001.110 (two's complement) 


In normalised form, the first digit after the point must be 0, so the point needs to be moved four 
places left. 


10001.110 = 1.0001110 x 2* = 10001110 0100 


Floating point addition and subtraction 


Before looking at these operations in binary, we can gain an understanding of the principles involved in 
floating point arithmetic by looking at equivalent calculations in denary. 


In denary, when adding two numbers involving decimal points, we first have to line up the points. 
For example: 132.156 
+ 1.0318 
133.1878 
In their “normalised form”, the two numbers above would be represented as 
132156 x 10° and 
.103180 x 10’ 
Clearly we do not simply add the mantissas, and the same principle holds true in binary. The rules for 
addition and subtraction can be stated as: 
* ine up the points by making the exponents equal 
e add or subtract the mantissas 


* normalise the result 
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Example 7 


Convert the denary numbers 0.25 and 10.5 to normalised floating point binary form using an 8-bit 
mantissa and a 4-bit exponent. Add together the two normalised binary numbers, giving the result in 
normalised floating point binary form. 


Step 1: The numbers in normalised form are: 


Lopijojojojojojo| optfolijolijojo} jojijojo 


Step 2: Write the mantissas with a binary point, and convert the exponents to denary, giving 
0.1000000 exponent -1 and 
0.1010100 exponent 4 
Step 3: Make both exponents 4 and shift the binary points accordingly 
0.0000010 (make the number smaller as you increase the exponent) 
0.1010100 
Step 4: Add the numbers, giving 0.1010110 exponent 4 (In this case it's already normalised) 


Resuitis Lot }olrfo}+}ijo} |of1jojo) 


1o¢1 {7 |ojojojolo| o¢i{tjol+jojojo} fojol1|o) 


Example 8 


Subtract the second of the two numbers given below from the first, giving the result in normalised floating 
point binary form. 


ost fojojolijojo} jojijijo} = Joga fojofojojsjo} fojrjojt| 


Step 1: Convert the exponents to denary, giving 
0.1000100 exponent 6 and 
0.1000010 exponent 5 
Step 2: Make both exponents 6 and shift the binary point of the second number accordingly 
0.1000100 exp 6 
0.0100001 exp 6 (make the number smaller as you increase the exponent) 
Step 3: —-Find the twos complement of the second number 


1.1011110 +1 = 1.1011111 
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Step 4: Add the numbers 
0.1000100 
1.1011111 
(1)0.0100011 exp 6 (ignore the carry) 


Now normalise the number by moving the binary point right 1 place, which increases the 
number, and decrease the exponent by 1 


Resutis [0¢1/ofoloj1}ijo} fojtjolt 


Underflow and overflow 


Underflow occurs when a number is too small to be represented in the allotted number of bits. If, for 
example, a very small number is divided by another number greater than 1, underflow may occur and the 
result will be represented by 0. 


Overflow occurs when the result of a calculation is too large to be held in the number of bits allocated. 


Exercises 


1. Anormalised floating point representation uses an 8-bit mantissa and a 4-bit exponent, both stored 
using two’s complement format. 


6-31 


(a) This is a floating point representation of a number: 


olor jo) 


Mantissa Exponent 
Calculate the denary number. Show your working. [2] 


(bo) Write the normalised representation of the denary value 12.75 in the boxes below: 


sca Pe 


Mantissa Exponent [2] 
(c) Floating point numbers are usually stored in normalised form. 


State two advantages of using a normalised representation. [2] 


2. Convert the following denary numbers to normalised floating point binary form, using an 8-bit 
mantissa and a 4-bit exponent. 


(a) -18.75 
(o) 0.0625 [2 


= 
hd 
— os 
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Chapter 32 Bitwise manipulation and masks 


Objectives 
A] ¢ Perform logical, arithmetic and circular shifts on binary data 
@ + Perform bitwise operations AND, OR and XOR 


A} « Use masks to manipulate bits 


Logical shift instructions 


All the bits move right or left. A logical shift right causes the least significant bit (Is) to be shifted into the 
carry bit, and a zero moves into the most significant bit (msb) to occupy the vacated space. 


Example 1 
“bit 
—+[1]o]s[+Jolofo]+J—+[o] [o]sfo[+]+Jofofo 
Before After 


It is useful for examining the least significant bit of a number. After the operation, the carry bit can be 
tested and a conditional branch executed. 


Example 2 


A logical shift left works in the same way, but the bits move left. The most significant bit (msb) moves into 
the carry bit and a zero moves into the Isb. You can visualise the carry bit as being on the left of the byte. 


carry 
bit 
0 j+—— 10] o]o/1] 1} 1] 0) }o}ojol+|1}1jojo| 


Before After 


Arithmetic shift instructions 
An arithmetic shift is similar, but it takes into account the sign bit, which always remains the same. 


Example 3 
Shifting right has the effect of dividing by 2. lf the sign bit is 1, 1 is moved in from the left instead of 0. 


t]ols{ifofofo]s}—-+[o} = [s]sfofi{+fofofo| 


Before After 
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Example 4 


Shifting left multiplies by 2. The shift bypasses the sign bit, leaving the msb the same whatever the value 
of the other bits. However, this may result in arithmetic overflow, as shown below, 


o] [+Jols]1Jololo} Lo} [+]s]sJo}ofojs fo 


Before After 


Multiplying two numbers using arithmetic shifts 
Using a combination of shifts and addition, two binary numbers may be multiplied together. 


Example 5 
Multiply 9 by 5 using shifts and addition: 
Multiply 9 x 1 0000 1001 
Multiply 9 by 4 with 2 left shifts: 0010 0100 
Add together: 00101101 =45 


Circular shift instructions 


A rotate or circular shift is useful for performing shifts in multiple bytes. In a circular shift right, the value in 
the least significant bit (Isb) is moved into the carry bit, and the carry bit is moved into the most significant 
bit (msb). 

carry bit 


A circular shift right of the bit pattern shown above will result in the following: 


carry bit 


[1] [ofsfofofofs]sfo 
Example 6 


Assume that RO and R1, shown below, are two 8-bit registers being used as a double register to hold 
a 16-bit binary integer, with RO holding the high half of the number. Show how a combination of shift 
instructions may be used to divide the 16-bit integer by 2. 


olsfofsfrfojofa} [| J [afsfofs]sfofofo| 
RO R14 


carry bit 
Answer: First perform an arithmetic shift right on RO. 1 is shifted into the carry bit. 


ofofsfols|sjofo} [1] [+] +Jo]s]sJofofo| 
RO Ri 


carry bit 
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Then perform a circular shift on R1. This places the carry bit into the msb of R1, and the carry bit is 
replaced with the Isb of R1. 


folol1jol1]1]{olo| bala TEAENEIEIENEI I 
RO Rt 


carry bit 


BODO 8 ORDO 


Logical instructions 
Boolean algebra is covered in Section 8, Chapters 40 and 41. 


The instructions NOT, AND, OR and XOR (exclusive OR) have the following effects: 


NOT AND OR-~ XOR 
Input A 1010 1010 1010 1010 
Input B 1100 1100 1100 
Result 0101 1000 1110 0110 


Explanation: In Boolean logic, 1 represents True and 0 represents False. A NOT instruction has only 
one input. If the input is True (i.e. 1), the output is False (i.e. 0). With the AND gate, if both 
inputs are True, (i.e.1) the output is True. Otherwise, the output is False. With the OR gate, 
if either of the inputs is True (i.e. 1) the output is True. Otherwise, the output is False. With 
the XOR gate, if either, but not both, of the inputs is True, the output is True. Otherwise, 
the output is False. 


Masks 
The OR function may be used to set selected bits to 1 without affecting the other bits. 
Example 7 


A system has 8 lights that can be turned ON (output 1) of OFF (output 0), controlled by an 8-bit binary 
code. At present, lights 1 to 4 are ON, lights 5 to 8 are OFF. Lights 5 and 6 are to be turned ON. 


Light number 12345678 
Present state 11110000 
OR with 00001100 
Result 11111100 


The AND function may be used to mask particular bits, by setting them to zero. 
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Example 8 


The ASCII bit pattern for the number “5” is 0011 0101. Convert this to a pure binary number using a 
mask. 


We need to mask out the first four bits. This can be done with an AND operation. 


ASCII “5” 00110101 
AND with 00001111 
Result 00000101 =Sinbinary 


The XOR function may be used to invert chosen bits. 


Example 9 
Convert an uppercase letter represented in ASCII to its lowercase equivalent. 


The letter “C”, for example, is 0100 0011 in ASCII. The lowercase letter “c” is 0110 0011. We want to 
change the third bit (counting from the left) from 0 to 1. 


ASCII "C" 01000011 
XOR with 00100000 
Result 01100011 ="c’" 

Exercises 


1. An 8-bit word holds the binary pattern 10110010. Start with this bit pattern in each of 
parts (a), (b) and (c). There is no need to show the contents of the carry bit. 


(a) State the contents of the word after a logical left shift of 2 bits. [1] 


(6) Interpreting the word as a number in two's complement form, state the contents of the 
word after an arithmetic right shift of 2 bits. [1] 


(c) State the contents of the word after a circular shift left of 3 bits. [1] 
2. In aparticular computer, characters are represented in 8 bits using the ASCII code. 

The codes for uppercase letters are from 0100 0001 for A to 0101 1010 for Z. 

The codes for lowercase letters are from 0110 0001 for a to 0111 1010 for z. 

Give an 8-bit mask and the appropriate logical operation which will: 

(a) change any uppercase letter into its lowercase equivalent [2] 

(0) change any lowercase letter into its uppercase equivalent. [2] 


3. A 32-bit register holds a four byte value. The bytes are numbered so that the first byte is 
leftmost. What mask and logical operator is required to achieve each of the following results: 


(a) complement the second byte [2] 


(6) set the third byte to zero? [2] 
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Chapter 33 — Arrays, tuples and records 


Objectives 


*« Be familiar with the concept of a data structure 


« Be familiar with arrays of up to 3 dimensions, tuples and records 


Data structures 


Computer languages such as Python, Pascal and VB have built-in elementary data types such as 
integer, real, Boolean and char. They also have some built-in structured data types such as string, 
array and record. These are made up of a number of elements of a specified type such as char, integer, 
real or string. 


1-dimensional arrays 


An array is defined as a finite, ordered set of elements of the same type, such as integer, real or char. 
Finite means that there is a specific number of elements in the array. Ordered implies that there is a 
first, second, third etc. element of the array. 


For example, (assuming the first element of the array is myArray [0]: 
myArray = [51, 72, 35, 37, 0, 3] 
x = myArray([2] #assigns 35 to x 
Example 1 


Every year the RSPB organises a Big Garden Birdwatch to involve the public in 
counting the number of birds of different types that they see in their gardens on a 
particular weekend. During 30-31 January 2016, more than 8 million birds were 
counted and reported. 


The scientists add all the sightings together, and once the data has been analysed, 
they can discover trends and understand how different birds and other wildlife are faring. 


An array of strings could be used to hold the names of the birds, and an array of integers to hold the 
results as they come in. As a simple example we will hold the names of 8 birds in an array: 


birdName = ["robin", “blackbird", "pigeon", "magpie", "bluetit", 
"thrush", "wren", "starling"] 


We can reference each element of the array using an index. For example: 
birdName[2] = "pigeon" #the index here is 2 

Most languages have a function which will return the length of an array, so that 
numSpecies = len[(birdName] 


will assign 8 to numSpecies. 
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To find at which position of the array a particular bird is, we could use the following pseudocode 
algorithm: 


bird = input("Enter bird name: ") 
birdFound = False 
numSpecies = len(birdName) 
for count = 0 to numSpecies - 1 
i= bird == birdName[count] then 
birdIndex = count 
birdFound = True 
endif 
next count 
i= birdFound == False then 
print ("Bird species not in array") 
else 
print ("Bird found at",birdIndex) 
endif 


We need a second array of integers to accumulate the totals of each bird species observed. We can 
initialise each element to zero. 


birdCount = [0,0,0,0,0,0,0,0) 
To add 5 to the blackbird count (the second element in the list) we can write a statement 
birdCount[(1] = birdCount[1] + 5 


The following algorithm enables a member of the Birdwatch team to enter results as they come in from 
members of the public. 


birdName = ["robin", "blackbird", "pigeon", "magpie", “bluetit", 
"thrush", “wren", "“starling"] 
birdCount = [0,0,0,0,0,0,0,0] 
bird = input("Please input name of bird (x to end): ") 
while bird != "x" 
birdFound = False 
for count = 0 to 7? 
if bird == birdName[count] then 
birdFound = True 
birdsObserved = input("number observed: ") 


birdCount [count] = birdCount [count] + birdsObserved 
endif 


next count 
if birdFound == False then 
print ("Bird species not in array") 

endif 

bird = input("Please input name of bird (x to end): ") 
endwhile 
#now print out the totals for each bird 
for count = 0 to 7 

print (birdName [count], birdCount [count] ) 
next count 
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2-dimensional arrays 


An array can have two or more dimensions. A two-dimensional array can be visualised as a table, rather 
like a spreadsheet. 


Imagine a 2-dimensional array called numbers, with 3 rows and 4 columns. Elements in the array can be 
referred to by their row and column number, so that numbers[1,3] = 8 in the example below. 


Example 2 


Write a pseudocode algorithm for a module which prints out the quarterly sales figures (given in integers) 
for each of 3 sales staff named Anna, Bob and Carol, together with their total annual sales. Assume that 
the sales figures are already in the 2-dimensional array quaarterSales. The staff names are held ina 
1-dimensional array staff. 


staff = ["Anna", "Bob", "Carol"] 
quarterSales = [[100,110,120,110), 
[350,355,360,360], 


(200,210,220, 220] ] 
for s = 0 to 2 
annualSales = 0 
#output staff name 
(insert statement here) 


for q = 0 to 3 
print ("Quarter ", q, quarterSales([s,q)) 
annualSales = annualSales + quarterSales([s,q] 
next g 
print ("Annual sales: ", annualSales) 
next 5 


Arrays of three dimensions 


Arrays may have more than two dimensions. An n-dimensional array is a set of elements of the same 
type, indexed by n integers. In a 3-dimensional array x, a particular element may be referred to as 
x[4,5,2], for example. The first element would be referred to as x[0,0,0]. 
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Tuples 


A tuple is an ordered set of values, which could be elements of any type such as strings, integers or real 
numbers, or even graphic images, sound files or arrays. Unlike arrays, the elements do not all have to be 
of the same type. However, a tuple, like a string, is immutable, which means that its elements cannot be 
changed, and you cannot dynamically add elements to or delete elements from a tuple. 


In Python a tuple is written in parentheses, for example: 
pupil = ("John", 78, "a") 

You can refer to individual elements of a tuple, for example: 
name = pupil[d]) 

but the following statement is invalid: 


pupil[0) = "Mary" 


Records 


lf you want to store data permanently so that you can read or update it at a future date, the data needs 
to be stored in a file on disk. The most common way of storing large amounts of data conveniently is to 
use a database, but sometimes you need to create and interrogate your own files. 


Generally, a file consists of a number of records. A record contains a number of fields, each holding 
one item of data. For example, in a file holding data about students, you might have the following 
record structure: 


01/05/2004 


7e8 Pau Gerrard | 47/11/2008 
jae Brian | Davison | 08/08/2002 


The table shows a file containing three records, each record having 5 fields. In some languages, a record 
type will be declared in the following manner: 


studentType = record 
integer ID 
string firstname 
string surname 
date dateOfBirth 
string class 

end record 


This is an example of a user-defined data tyoe named studentType. 

Avariable student of type studentType may then be declared as 
student : studentType 

Every field in a record can be identified by <recordName>.<fieldName>. 


The surname of the student, for example, would be referred to as student.surname. 
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Exercises 


1. Referring to the BirdWatch program given earlier in this chapter: 


(a) Explain why the for... next loop repeated below is not the most efficient type of loop 
in this situation. [1] 


for count = 0 to 7 
if bird == birdName[count] then 
birdFound = True 
birdsObserved = input("Enter number of birds observed: ") 
birdCount [count] = birdCount[count] + birdsObserved 
endif 
next count 


(6) Rewrite the algorithm using a different type of loop. [3] 


2. The birth weights in grams of 100 babies, which vary between 1500 to 4000 grams, are held 
in an array weight. 


Write pseudocode for an algorithm which calculates the average birth weight, and then prints 
out the number of babies who are more than 500 grams below the average weight, together 
with the average weight of these. [5] 


3. The marks for 3 assignments, each marked out of 10, for a class of 5 students are to be input 
into a two-dimensional array mark so that mark[3,1], for example, holds the second mark 
achieved by the 4th student. Any missing assignments are given a mark of zero. 


Draw a table representing this array, and fill it with test data. [2] 


Write a pseudocode algorithm which allows the user to enter the marks for the class. 
Calculate the average mark for each student, and the class average. [4] 


4. In acertain game, treasure is hidden in a 10x10 grid. The grid coordinates are given by 
grid[row,col] where grid[0,0) represents the top left hand corner and grid[9, 9] the 
bottom right corner. The grid coordinates of the treasure are signified by a1 at grid[row,col]. 
All other grid elements are filled with zeros. 


What is the purpose of the following pseudocode algorithm? [2] 


for row = 0 to 9 
for col = 0 toa 9 


if grid[row, col] == 1 then 
print("row "; row, ® column "; col) 
endif 
next col 
next raw 


Write pseudocode statements to initialise the grid and “hide the treasure” at a random location 
inside the grid. [5] 
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Chapter 34 — Queues 


Objectives 


A) 
A) 


e Understand the concept of an abstract data type 
e Be familiar with the concept and uses of a queue 
¢ Describe the creation and maintenance of data within a queue (linear, circular, priority) 
¢ Describe and apply the following to a linear, circular and priority queue 
o add an item 
oO remove an item 
o test for an empty queue 
o test for a full queue 


Abstract data types 


An abstract data type is one that is created by the programmer, rather than defined within the 
programming language. They include structures such as queues, stacks, trees and graphs. These can 
easily be shown in graphical form, and it is not hard to understand how to perform operations such as 
adding, deleting or counting elements in each structure. However, programming languages require data 
types to represent them. An abstract data type (ADT) is a logical description of how the data is viewed 
and the operations that can be performed on it, but how this is to be done is not necessarily Known to 
the user, It is up to the programmer who creates the data structure to decide how to implement it, and 

it may be built in to the programming language. This is a good example of data abstraction, and by 
providing this level of abstraction we are creating an encapsulation around the data, hiding the details 
of implementation from the user. 


As a programmer, you will be quite familiar with this concept. When you call a built-in function such as 
random to generate a random number, or sqrt to find the square root of a number, you are not at alll 
concerned with how these functions are implemented. 


Queues 


A queue is a First In First Out (FIFO) cata structure. New elements may only be added to the end of a 
queue, and elements may only be retrieved from the front of a queue. The sequence of data items in a 
queue is determined, therefore, by the order in which they are inserted. The size of the queue depends 
on the number of items in it, just like a queue at traffic lights or at a supermarket checkout. 


Queues are used in a variety of applications: 


¢ Output waiting to be printed is commonly stored in a queue on disk. In a room full of networked 
computers, several people may send work to be printed at more or less the same time. By putting 
the output into a queue on disk, the output is printed on a first come, first served basis as soon as 
the printer is free. 


e Characters typed at a keyboard are held in a queue in a keyboard buffer. 


* Queues are useful in simulation problems. A simulation program is one which attempts to model 
a real-life situation so as to learn something about it. An example is a program that simulates 
customers arriving at random times at the check-outs in a supermarket store, and taking random 
times to pass through the checkout. With the aid of a simulation program, the optimum number of 
check-out counters can be established. 


CHAPTER 34 — QUEUES 


Operations on a queue 


The abstract data type queue is defined by its logical structure and the operations which can be 
performed on it. It is described as an ordered collection of items which are added at the rear of the 
queue, and removed from the front. 


— cli Jason — im Milly | Bob Bob | « ae . 


front =0 rear=3 


When Eli leaves the queue, the front pointer is made to point to Jason; the elements themselves do 
not move. When Adam joins the queue, the rear pointer points to Adam. Think of a queue in a doctor's 
surgery — people leave and join the queue, but no one moves chairs. 


———F ewon [| wy | bob | Adem | 


front = 1 rear=4 


The following queue operations are needed: 


* enQueue(item) Add a new item to the rear of the queue 


® deQueuel) Remove the front item from the queue and return it 
e isEmpty() Test to see whether the queue is empty 
e  isFull() Test to see whether queue is full 


Jueue contents 


["Blue", "Red", "Green"] 


g.deQueue() pd 


q.enQueue("Yellow") 
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Dynamic vs static data structures 
An abstract data tyoe may be implemented using either a dynamic or a static data structure. 


A dynamic data structure refers to a collection of data in memory that has the ability to grow or 
shrink in size. It does this with the aid of the heap, which is a portion of memory from which space is 
automatically allocated or de-allocated as required. 


Languages such as Python, Java and C support dynamic data structures, such as the built-in List data 
type in Python. A potential drawback of using a dynamic data structure is that the data structure may 
cause overflow if it exceeds the maximum memory limit, and the program will crash. 


Dynamic data structures are very useful for implementing data structures such as queues when the 
maximum size of the data structure is not Known in advance. The queue can be given some arbitrary 
maximum to avoid causing memory overflow, but it is not necessary to allocate space in advance. A 
further advantage of using a built-in dynamic data structure such as a list is that many methods or 
functions such as append, remove, length, insert, search and pop may already be written and 
can be used in the implementation of other data structures such as a queue or stack. 


A static data structure such as an array is fixed in size, and cannot increase in size or free up 
memory while the program is running. An array is suitable for storing a fixed number of iterns such as the 
months of the year, monthly sales or average monthly temperatures. The disadvantage of using an array 
to implement a dynamic data structure such as a queue is that the size of the array has to be decided 

in advance by the programmer, and if the number of items added fills up the array, then no more can be 
added, regardless of how much free space there is in memory. Python does not have a built-in array 
data structure. 
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: Implementing a linear queue 
There are basically two ways to implement a linear queue in an array or list: 


1, As iterns leave the queue, all of the other items move up one position in allocated memory so that the 
front of the queue is always the first element of the structure, e.g. q[O]. With along queue, this may 
require significant processing time. 


2. A linear queue can be implemented as an array with pointers to the front and rear of the queue. An 
integer holding the size of the array (the maximum size of the queue) is needed, as well as a variable 
giving the number of items currently in the queue. However, clearly a problem will arise as many items 
are added to and deleted from the queue, as space is created at the front of the queue which cannot 
be filled, and items are added until the rear pointer points to the last element of the data structure. 


7 A circular queue 


One way of overcoming the limitation of a static data structure such as an array is to implement the 
queue as a circular queue, so that when the array fills up and the rear pointer points to the last element 
of the array, say q[5], it will be made to point to the first element, q[O0], when the next person joins 

the queue, assuming this element is empty. This solution requires some extra effort on the part of the 
programmer, and is less flexible than a dynamic data structure if the maximum number of items is not 
known in advance. 
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Pseudocode for implementing a circular queue 
To initialise the queue: 


procedure initialise 
front = 0 
rear = -l 
size = 0 
maxSize = size of array 
endprocedure 


To test for an empty queue: 


function isEmpty 
if size == 0 then 
return True 
else 
return False 
endif 
endfunction 


To test for a full queue: 


function isFull 


if size == maxSize then 
return True 
else 
return False 
endif 
endfunction 


To add an element to the queue: 


procedure enqueue (newItem) 
if isFull then 
print ("Queve full") 
else 
rear = (rear + 1) MOD maxSize 
q[rear] = newItem 
size = size +1 
endif 
endprocedure 
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To remove an item from the queue: 


function dequeue 
if isEmpty then 
print ("Queue empty") 
item = Null 
else 
item = q[{front] 
front = (front + 1) MCD maxSize 
size = size - 1 
endif 
return item 
endfunction 


: Priority queues 


In some situations where items are placed in a queue, a system of priorities is used. For example an 
operating system might schedule jobs in order of priority, or a printer may give shorter print jobs priority 
over longer ones. 


A priority queue acts like a queue in that items are dequeued by removing them from the front of the 
queue. However, the logical order of items within the queue is determined by their priority, with the 
highest priority items at the front of the queue and the lowest priority items at the back. It is therefore 
possible that a new item joins the queue at the front, rather than at the rear. 


Such a queue could be implemented by checking the priority of each item in the queue, starting at the 
rear and moving it along one place until an item with the same or lower priority is found, at which point 
the new item can be inserted. 
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Exercises 
1. (a) Explain why a queue may be implemented as a circular queue. [2] 


(0) Explain what is meant by a dynamic data structure and why an inbuilt dynamic data 
structure in a programming language may be useful in implementing a queue. [2] 


(c) Print jobs are put in a queue to be printed. The queue is implemented in an array, indexed 
from 0, as a circular queue which can hold 5 jobs. Jobs enter the queue in the sequence Job1, 
Job2, Job3, Job4, Job5. Pointers front and rear point to the first and last items in 
the queue respectively. 


(i) Draw a diagram to show how the print jobs are stored. Include pointers in your diagram. [3] 
(ii) Two jobs are printed and leave the queue. Another job, Job6, joins the queue. 


Draw a diagram representing the new situation. [2] 
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2. The size of some data structures is fixed when the structure is created. 
(a) State the term used to describe such data structures. 
Give one example of a type of data structure whose size is always fixed. : 
Give one advantage of using a fixed size data structure. [3] 
(0) A queue data structure has two pointers called front and next which are defined as: 7 
front points to the first item in the queue 


next points to the next available space 


The queue is defined as a first in, first out (FIFO) data structure. 
(i) State the condition of the pointers when the queue is empty. [7] 
(ii) Write an algorithm to remove one data item from a queue. [4] 


(c) The queue may be represented by a fixed size data structure. 


data structure 


front next 


Explain, with the aid of a diagram, what happens when attempting to add 3 data 
items to the queue. [5] : 


OCR F453-01 Qu 5 June 2012 
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Chapter 35 — Lists and linked lists 


Objectives 


« Explain how a list may be implemented as either a static or dynamic data structure 
¢ Show how iterns may be added to or deleted from a list 
€) * Describe the linked list data structure 


A) * Show how to create, traverse, add data to and remove data from a linked list 


Definition of a list 


In computer science, a list is an abstract data type consisting of a number of items in which the same 
item may occur more than once. The list is sequenced so can refer to the first, second, third,... item and 
we can also refer to the last element of the list. 


A list is a very useful data type for a wide variety of operations, and can be used, for example, to 
implement other data structures such as a queue, stack or tree. Some languages such as Python have a 
built-in list data type, so that for example a list of numbers could be shown as 


[45, 13, 19, 13, 8] 


Operations on lists 
Some possible list operations are shown in the following table. The list a is assumed to hold the values 
[45, 13, 19, 13, 8] initially, with the first element referred to as a[OJ. 


| isEmpty) Test for empty list | a.isEmpty() [45, 13, 19, 13, 8] 


Add a new item to list to 
append{item) tha end of the list a.append(33) [45, 13, 19, 13, 8, 33] a 
remove(item) Remove ihe Wes os Tene a.remove(13) [45, 19, 13, 8, 33] 
of an item from list 


Search for anitem in list | a.search(22) | [45, 19, 13, 8, 33] 


length() Return the number of items | a.length( [45, 19, 13, 8, 33] 
Return the position of item | a.index(8) [45, 19, 13, 8, 33] 


| Insert a new item at 
insert(pos,item) | °°" a.insert(2,7) _| [45, 19, 7, 13, 8, 33] Zz 
position pos 
Remove and return the last | | 
Remove and return the or 
PORPOS) itern at position pos aOR Beets 1 
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Using an array 


It is possible to maintain an ordered collection of data items using an array, which is a static data 
structure. This may be an option if the programming language does not support the list data type and 
if the maximum number of data items is small, and is known in advance. 


The programmer then has to work out and code algorithms for each list operation. The empty array 
must be declared in advance as being a particular length, and this could be used, for example, to hold a 
priority queue. 


Inserting a new name in the list 


If the list needs to be held in sequential order in the array, the algorithm could first determine where a new 
itern has to be added, and then if necessary, start at the end of the list and move the rest of the items 
along in order to make room for it. 


a 


Hoty | dames | Nathan | Paul | Sophie [| 
cs oe 


The steps are as follows: 


Test for list already full, print message if it is and quit 
Determine where new item needs to be inserted 

Starting at the end of the list, move other items along one place 
Insert new item in correct place 


Deleting a name from the list 


Suppose the name Ken is to be deleted from the list shown below. The names coming after Ken in the 
list need to be moved up to fill the gap. 


[woty [ dames | Ken [ Nathan [ Pout | Sonne | __ 
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First, items are moved up to fill the empty space by copying them to the previous spot in the array: 


[Hoty [ dames [ Nathan | Pout | soptie | Sonne | __ 


Finally the last element, which is now duplicated, is replaced with a blank. 


[Hoty | vemos [ Nathan [| Pout | soon [| _ 
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Linked lists 


Definition 
A linked list is a dynamic data structure used to hold an ordered sequence, as described below: 


e The items which form the sequence are not necessarily held in contiguous data locations, or in the 
order in which they occur in the sequence 


e Each item in the list is called a node and contains a data field and a next address field called a link 
or pointer field (the data field may consist of several subfields.) 


The data field holds the actual data associated with the list item, and the pointer field contains the 
address of the next item in the sequence 


e The link field in the last item indicates that there are no further items by the use of a null pointer 


e Associated with the list is a pointer variable which points to (i.e. contains the address of) the first 
node in the list 


| Operations on linked lists 
In the examples which follow we will assume that the linked list is held in memory in an array of records, 
and that each node consists of a person’s name (the data field) and a pointer to the next item in the list. 


We will explore how to set up or initialisé an empty list, insert new data in the correct place in the list, 
delete an unwanted item and print out all items in the list. We will also look at the problem of managing 
the free space in the list. 


A node record may be defined like this: 


type nodeType 
string name 
integer pointer 
endType 


dim Names[0..5] of nodeType 


Initialising a linked list 


We need to keep two linked lists; one for the actual data, and one for the free space. When a new item 
is added, it is put in the node pointed to by next free. When a node is deleted, it is linked into the free 
space list. 
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The array is initialised prior to entering any names, and it will consist of just one linked list of free space. 
After initialisation, next free points to the first free space in the list, Names [0]. : 


A pointer named start will point to the first data item in the list. This will be initialised to null, 
indicating that the list is empty. The last item in the free space list also has a pointer of null, indicating 
that this is the last available free space in the list. 


The array holding the linked list now looks like this: 


Figure 35.1 


After the names Browning, Turner, Johnson and Cray have been added, the array will look like this: 


(aa | 
Inde K 
l 


(0 [Browning [3 
[Tuner [nuit] start = 0 


5 


nextfree = 4 


Figure 35.2 


Notice that we now have two linked lists going; the list linking the nodes containing names and the list 
linking the free nodes. 


* apointer start points to the first item in the list 
* nextfree is a pointer to the next free location in the array 


* the free spaces in the array are organised as a linked list 


* names can be retrieved in alphabetical order by following the links 


Inserting an item 
We'll now work out an algorithm for inserting a name into the middle of the list. As an example, we'll 
insert Mortimer between Johnson and Turner. The pointers will have to be changed so that it is linked into 
the correct place. 
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After i i 


er insertion of Mortimer, the list will appear as in Figure 35.3. 


a 
Cs a 


Figure 35.3 


Here are the steps: 


store the new name Mortimer in the node pointed to by nextfree 
determine, by following links, where new item should be linked in 
change nextfree to point to next free location 

change Mortimer’s pointer to point to Turner 

change Johnson’s pointer to point to Mortimer 


Diagrammatically, this is wnat we have done: 


Before insertion: 


“0 | o[ Bowring [3 }-of Grey [2 | of dotnson| + ]->[ Turner [rl 
0 3 2 1 


start 


nextfree 


After insertion: 


omson] 4] [Turner [rt 


2 1 


ty 
z 
o 5 h 
= 
oO 
“2 


Figure 35.4 


Extra steps will be needed to be added to the algorithm to cope with the special cases of inserting a 
name at the very front of the list (e.g. Allen), or inserting the first name into an empty list. 
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Before we go further and express this algorithm in more formal pseudocode, you need to make sure you: 
clearly understand the notation used. : 


Names [p] .name holds the name in node [p], that is, the node pointed to by p 


Names [p] .pointer holds the value of the pointer in node [p] 


Notice how you can ‘peek ahead’ using the pointers to see what name is in the next node, or even the 
node after that one, and so on. 


This is crucial because you need to Know where you have come from (the previous node), when you get 
to the node that has a name “greater” than the new one to be inserted. 


Here's a simplified algorithm to add a new name to the list. The complications of inserting at the head of 
a list and dealing with a full list are dealt with in the algorithm on the next page. 


The comments in the algorithm refer to inserting the name Mortimer in the linked list shown in Figures 
35.2 and 35.4, 


Names [nextfree]).name = newName //store name in next free node 
p = start 

follow pointers until Names[p]).pointer points to a name > new name 
temp = nextfree //put 4 in temp (Step 1) 
nextfree = Names[temp] .pointer /f/put 5 in nextfree (Step 2) 


Names [temp].pointer = Names[p].pointer //put 1 in Mortimer’s 
pointer field (Step 3) 

Names [p].pointer = temp //put 4 in Johnson’s 
pointer field (Step 4) 


nextfree 


Diagramatically: 


temp Names[temp].pointer 
aie oe 
Names|[p].pointer 
Figure 35.5 
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* 


Pseudocode algorithm for inserting an item 


The following algorithm copes with a full list and the special case of inserting an item at the front of the 
list. It also manages the free space list. 


O1 
02 
03 
04 
05 
06 


Q7 
08 
09 
19 
11 
12 
is 


14 


LS 
16 
1? 


18 
19 
20 
a1 
22 
23 
24 
a 
26 
27 
28 
29 
39 
| 
32 
Ka 
34 
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procedure AddItem (newName) 
// check if list is full and if so, print error message 


if nextfree == null then 
print ("List full") 
else 


Names [nextfree] .name = newName 


if start == null then // empty list 
temp = Names [nextfree].pointer //save pointer 
Names [nextfree].pointer = null 
start = nextfree 
nextfree = temp 


else 
p = start 
if newName < Names[p].name then /finsert at front 
of list 
Names [nextfree] .pointer = start 
start = nextfree 
else 
placeFound = false // general case 
while Names[p].pointer != null and placeFound = false 
//peek ahead 
if newName >= Names [Names([p].pointer].name then 
p = Names[p].pointer 
else 
placefound = True 
endif 
endwhile 
temp = nextfree 
nextfree = Names[temp] .pointer /fupdate nextfree.. 
Names[temp].pointer = Names[p].pointer 
Names[p].pointer = temp //.and pointer in free list 
endif 
endif 
endprocedure 
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Deleting an item 
Returning to the table as in Figure 36.4, shown again below, we will delete Johnson. 


start =0 


nextfree = 4 


Figure 35.6 


follow the pointers until Johnson is found 
change Cray’s pointer to point to Turner 
change Johnson’s pointer to nextfree 
change nextfree to point to Johnson 


This is shown diagrammatically below. 


Before deletion: 


“0 }-[ Browning [2 }>[_cray [2 }-[vonnson | 1 }—»[ Turner [oa 
0 2 1 


start 3 


nextfree 5 
After deletion | 
9 |->| Browning | 3 —>|_Cray | 1 | | Johnson | 4 | 
start 0 3 2 1 
nextfree 4 5 
Figure 35.7 
Here is the simplified algorithm: 
p = start 
follow pointers until Names[p].pointer points to the name to delete 
temp = Names[p].pointer //put 2 in temp 
Names[p].pointer = Names[temp].pointer //put 1 in Cray’s pointer 
field 
Names[temp].pointer = nextfree //put 4 in Johnson’s pointer 
field 
nextfree = temp //put 2 in nextfree 
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Pseudocode algorithm for deleting an item 


The following algorithm handles exceptions such as inserting into an empty list and deleting the first item 
in the list. It also returns the empty space to the front of the free space list. 


01 procedure deleteItem(deleteName) 
02 // check for empty list 


03 if start = null then 
04 print ("List is empty") 
05 else 
06 p = start 
O07 if deleteName = Names[start].name then 
08 start = Names[start].pointer 
09 else 
10 while deleteName != Names [Names[p] .pointer] .name 
11 p = Names([p].pointer 
12 endwhile 
is endif 
14 //Names[p].pointer now points to the node to be deleted 
15 /fadjust the pointers 
16 temp = Names(p].pointer 
‘ 17 Names[(p].pointer = Names[temp].pointer 
18 Names[temp] = nextfree 
19 nextfree = temp 
1 20 endif 
0 21 endprocedure 
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Exercises 
1. A list data structure can be represented using an array. 


The pseudocode algorithm below can be used to carry out one useful operation on a list. 


pe=l 
if ListLength > 0 then 
while p <= ListLength AND List[p] < NewItem 
ypepti 
endwhile 
for q = ListLength downTo p 
List[q + 1] = List[q] 
next q 
endif 
List[p] = NewItem 
ListLength = ListLength + 1 


The initial values of the variables for one particular execution of the algorithm are shown in the 
trace table below, labelled Table 1, 


2) 


Draw the trace table for the execution of the algorithm. The first line is given and you will need to 
draw extra rows. 


Table 1 


[4] 
A-Level only 

(6) Describe the purpose of the algorithm in Figure 1. [i] 3 
(c) Alist implemented using an array is a static data structure. The list could be implemented 
using a linked list as a dynamic data structure instead. 
Describe one difference between a static data structure and a dynamic data structure. [1] 

2. (a) The birds Robin, Sparrow, Blackbird, are entered, in the order given, into a linked list so 
that they may be processed alphabetically. Draw a diagram of this linked list. [2] : 

(6) Redraw the diagram after two additional items, Chaffinch and Goldfinch, are added. [2] 3 
(c) Show the list implemented in an array of records, with each node holding a data item anda 
pointer, after the addition of the new items. [4] : 

(d) Write a pseudocode algorithm to print out the birds in the list in alphabetical order. [4] 
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Chapter 36 — Stacks 


Objectives 


¢ Be familiar with the concept and uses of a stack 
e Be able to describe the creation and maintenance of data within a stack 


A] ¢ Be able to describe and apply the following operations: push, pop, peek (or top), test for empty 
stack, test for full stack 


A} e Be able to explain how a stack frame is used with subroutine calls to store return addresses, 
parameters and local variables 


a 


Concept of a stack 


A stack is a Last In, First Out (LIFO) data structure. This means that, like a 
stack of plates in a cafeteria, items are added to the top and removed from 
the top. 


Applications of stacks 


A stack is an important data structure in Computing. Stacks are used in calculations, and to hold return 
addresses when subroutines are called. When you use the Back button in your Web browser, you will be 
taken back through the previous pages that you looked at, in reverse order as their URLs are removed 
from the stack and reloaded. When you use the Undo button in a word processing package, the last 
operation you carried out is popped from the stack and undone. 


Implementation of a stack 
A stack may be implemented as either a static or dynamic data structure. 


A static data structure such as an array can be used with two additional variables, one being a pointer to 
the top of the stack and the other holding the size of the array (the maximum size of the stack). 


Top of stack ——"~* 
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Operations on a stack 
The following operations are required to implement a stack: 


* fpushiitem) adds a new item to the top of the stack 


* popi) removes and returns the top item from the stack 

* peek() returns the top item from the stack but does not remove it 

e isEmpty() tests to see whether the stack is empty, and returns a Boolean value 
e = isFull() tests to see whether the stack is full, and returns a Boolean value 


OT | 


s.isEmpty() 

s.push(‘Red’) [‘Blue’, 'Red’] a 

.push(‘Green’) [‘Blue’, 'Red’, 'Green’] i... | 
[‘Blue’, 'Red’, ‘Green’] 


rue Red 
fue Red 


A-Level only 
The following pseudocode implements four of the stack operations using a fixed size array. im 7-36 


function isEmpty 
if top == -l1 then 
return True 
else 
return False 
endif 
endfunction 


‘Blue , 
[Biue’, Red’, ‘Green| 


function isFull 
if top == maxSize then 
return True 
else 
return False 
endif 
endfunction 


procedure push (item) 
if isFull then 
print ("Stack is full") 
else 
top = top + 1 
s(top)= item 
endif 
endprocedure 
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function pop 
if isEmpty then 
print ("Stack is empty") 
else 
item = s(top) 
top = top - l 
return item 
endif 
endfunction 


Some languages, such as Python, make it very easy to implement a stack using the built-in dynamic 
list data structure, with the top of the stack being the last element of the list. 


The function len (s) can be used to determine whether the stack is empty, and if it is not, pop () will 
remove and return the top element. The built-in method append (item) will append or push an item 
onto the top of the stack (the last element of the list). 


Overflow and underflow 


A stack will always have a maximum size, because memory cannot grow indefinitely. If the stack is 
implemented as an array, a full stack can be tested for by examining the value of the stack pointer. An 
attempt to push another item onto the stack would cause overflow so an error message can be given to 
the user to avoid this. Similarly, if the stack pointer is -1, the stack is empty and underflow will occur if 
an attempt is made to pop an item. 


Functions of a call stack 


A major use of the stack data structure is to store information about the active subroutines while a 
computer program is running. The details are hidden from the user in all high level languages. 


Holding return addresses 


The call stack keeps track of the address of the instruction that control should return to when a 
subroutine ends (the return address). Several subroutines may be nested, so that the stack may 
contain several return addresses which will be popped as each subroutine completes. For example, a 
subroutine which draws a robot may call subroutines drawCircle, drawRectangle etc. Subroutine 
drawRectangle may in turn call a subroutine drawLine. 


A recursive subroutine may contain several calls to itself, so that with each call, a new itern (the return 
address) is pushed onto the stack. When the recursion finally ends, the return addresses that have been 
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pushed onto the stack each time the routine is called are popped one after the other, each time the end 


of the subroutine is reached. If the programmer makes an error and the recursion never ends, sooner or 
later memory will run out, the stack will overflow and the program will crash. 


Holding parameters 


Parameters required for a subroutine (such as, for example, the centre coordinates, line colour and 
thickness for a circle subroutine) may be held on the call stack. Each call to a subroutine will be given 
separate space on the call stack for these values. 


Local variables 


A subroutine frequently uses local variables which are accessible and usable only within the subroutine. 
These may also be held in the call stack. Each separate call to a subroutine gets its own space for 

its local variables. Storing local variables on the call stack is much more efficient than using dynamic 
memory allocation, which uses heap space. 


The stack frame 


A call stack is composed of stack frames. Each stack frame corresponds to a call to a subroutine which 
has not yet terminated. 


top of stack 


stack pointer 


Stack frame for 
GrawLine 


Stack frame for 
drawRectangle 


Exercises 


1. A Last In, First Out (LIFO) data structure has a pointer called top. 


(a) What is this tyoe of data structure known as? [1] 
(b) Name and briefly describe one type of error that could occur when attempting to add 

a data item or remove a data item from the data structure. [2] 
(c) Describe briefly one use of this type of data structure in a computer system. [2] 


(dq) Write a pseudocode procedure for reversing the elements of a queue with the aid ofa stack. [6] 


0 
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Chapter 37 — Hash tables 


Objectives 


¢ Be familiar with a hash table and its uses 
¢ Be able to apply simple hashing algorithms 


A} « Know what is meant by a collision and how collisions are handled using rehashing 
@* Be familiar with the concept of a dictionary 


¢ Be familiar with simple applications of a dictionary 


Hashing 


Large collections of data, for example customer records in a database, need to be accessible very 
quickly without having to look through all the records. This can be done by holding an index of the 
physical address on the file where the data is held. But how is the index created? 


The answer is that a hashing algorithm is applied to the value in the key field of each record to 
transform it into an address. Normally there are many more possible keys than actual records that need 
to be stored. For example, if 300 records are to be stored, each having a unique 6-digit identifier or key, 
1000 free spaces may be allocated to store the records. 


One common hashing algorithm is to divide the key by the number of available addresses and take the 
remainder as the address. Using the algorithm (address = key mod 1000): 


453781 would be stored at address 781 
447883 would be stored at address 883 
134552 would be stored at address 552 


What will happen when the record with key 631552 is to be stored? This will hash to the same address 
as 134552 and is called a synonym. Synonyms are bound to occur with any hashing algorithm, and two 
record keys hashing to the same address is referred to as a collision. 


A simple way of dealing with collisions is to store the item in the next available free space. Thus 134552 
would be stored at address 553, assuming this space is unoccupied. 


: Hash table 


A hash table is a collection of items stored in such a way that they can quickly be located. The hash table 
could be implemented as an array or list of a given size with a number of empty spaces. An empty hash 
table that can store a maximum of 11 items is shown below, with spaces labelled 0,1, 2,...10. 


0 1 2 3 4 5 6 Tt 8 9 10 
[Empty | Emoy | Empty | Empty | Empty | Empty | Empiy 


Now assume we wish to store items 78, 55, 34, 19 and 29 in the table using the method described 
above, using division by 11 and taking the remainder. Collisions are stored in the next available free slot. 


First of all, calculate the hash value of each item to be stored. 
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Each of these items can now be inserted into their location in the hash table. 


0 


1 2 3 4 5 6 7 8 9 10 


Searching for an item 


When searching for an item, these steps are followed: 


apply the hashing algorithm to the key field of the item 
examine the resulting cell in the list 

if the item is there, return the item 

if the cell is empty, the item is not in the table 


if there is another item in that spot, Keep moving forward until either the item is found or a blank cell 
is encountered, when it is apparent that the item is not in the table 


Other hashing algorithms 


To be as efficient as possible, the hashing algorithm needs to be chosen so that it generates the least 
number of collisions. This will depend to some extent on the distribution of the items to be hashed. 


Folding method 


There are many other algorithms for determining hash values. The folding method divides the item into 
equal parts, and adds the parts to give the hash value. For example, a phone number 01543 677896 can 
be divided into groups of two, namely 01, 54, 36, 77, 89, 6. Adding these together, we get 263. If the 
table has fewer spaces than the maximum possible sum generated by this method, say 100 cells, then 
the extra step of dividing by 100 needs to be applied. 


Iterr ‘Folded” value Remainder Location in hashtable | 
a 
OO 
a 
a 
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Hashing a string 


A hash function can be created for alohanumeric strings by using the ASCII code for each character. 
A portion of the ASCII table is shown below: 


To hash the word CAB, we could add up the ASCII values for each letter and, if there are 11 spaces in 
the hash table, for example, divide by 11 and take the remainder as its hash value. 


67 + 65 + 66 = 198 Hash value = 198 mod 11 =0 


so CAB goes in location 0 (assuming that location is empty). 


: Collision resolution 


The fuller the hash table becomes, the more likely it is that there will be collisions, and this needs to be 
taken into account when designing the hashing algorithm and deciding on the table size. For example, 
the size of the table could be designed so that when all the items are stored, only 70% of the table's cells 
are occupied. 


Rehashing is the name given to the process of finding an empty slot when a collision has occurred. 
The rehashing algorithm used above simply looks for the next empty slot. It will loop round to the first 
cell if the table of the end is reached. A variation on this would be to look at every third cell, for example 
(the “plus 3” rehash). Alternatively, the hash value could be incremented by 1, 3, 5, 7... until a free space 
is found. 


Different hashing and rehashing methods will work more efficiently on different data sets — the aim is to 
minimise collisions. 


| Uses of hash tables 


Hash tables are primarily used for efficient lookup, so that for example an index would typically be 
organised as a hash table. A hash table could be used to look up, say a person’s telephone number 
given their name, or vice versa. They can also be used to store data such as user codes and encrypted 
passwords that need to be looked up and verified quickly. 


Hash tables are used in the implementation of the data structure called a dictionary, which is discussed 
below. A dictionary is a useful data structure for implementing graphs, introduced in the next chapter. 
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A dictionary is an abstract data type consisting of associated pairs of items, where each pair consists of 
a key and a value. It is a built-in data structure in Python and Visual Basic, for example. When the user 
supplies the key, the associated value is returned. Items can easily be amended, added to or removed 
from the dictionary as required. 


Dictionaries 


In Python, dictionaries are written as comma-delimited pairs in the format key:value and enclosed in curly 
braces. For example: 


IDs = {342:’Harry’, 634:’Jasmine’, 885:’Max’,571:'’Sheila’ } 


Operations on dictionaries 


It is possible to implement a dictionary using either a static or a dynamic data structure. The 
implementation needs to include the following operations: 


e Create a new empty dictionary 

e Add anew key: value pair to the dictionary 

e Delete a key: value pair from the dictionary 

e Amend the value ina key:value pair 

e Return a value associated with key k 

e Return True or False depending on whether key is in the dictionary 


e Return the length of the dictionary, i.e. the number of key: value pairs 


An interactive Python session is shown below: 


>>> IDs = {342:'Harry', 634:'Jasmine', 885:'Max', 571: 'Sheila'} 
>>> IDs 

{634: ‘Jasmine’, 571: 'Sheila', 885: "Max', 342: ‘Harry’ } 

>>> IDs[(885] 

'Max' 

>>> IDs[333] = '"Maria’ 

>>> IDs 

{634:; ‘Jasmine’, S71: ‘Shella’, 885: "Max', 342: ‘Harry’, 333: ‘Maria'"} 
>>> IDs[885] = 'Maxine' 

see IDs 

{634: ‘Jasmine’, S71: "Sheila', 885: "Maxine", 342: ‘Harry’, 333: 
'Maria'} 

>>> del IDs[885] 

>>> IDs 

{634: "Jasmine’, 571: 'Shella', 342: ‘Harry’, 333: ‘Maria’ } 

>>> 634 in IDs 

True 

>>> len(IDs) 

‘4 


Note that the pairs are not held in any particular sequence. The key is hashed using a hashing algorithm 
and placed at the resulting location in a hash table, so that a fast lookup is possible. -) 
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1.Student records held by a school are stored in a database which organises the data in files 
using hashing. 


(a) In the context of storing data in a file, explain what a hash function is. 


(b} The system allows for a maximum of 1000 unique 6-digit integer student IDs in the file 
holding current student records. Give an example of a hashing function that could be used 
to find a particular record. Ignore collisions. 


2, Abank has a number of safety deposit boxes in which customers can store valuable documents 
or possessions. The details of which box is rented by a customer with a particular account number 
are held in a dictionary data structure. Sample entries in the dictionary are: 


(0083456: 'C11’, 0154368: 'B74’, 1178612: 'B6’, 0567123: 'A34'} 
(a) What value will be returned by a lookup operation using the key 1178612? 
(b) The dictionary is implemented using a hash table, using the algorithm 
accountNumber mod 500 
What value is returned by the hashing function when it is applied to account number 0093421? 
(c) What is the maximum number of entries that can be made in the dictionary? 
(d} (i) Explain what is meant by a collision. 


(i) Give an example of how a collision might occur in this scenario, using sample account 
numbers. 


(iii 


— 


Describe one way of dealing with collisions in the hash table. 
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Chapter 38 — Graphs 


Objectives 
« Be aware of a graph as a data structure used to represent complex relationships 
e« Be familiar with typical uses for graphs 


Be able to explain the terms: graph, weighted graph, vertex/node, edge/arc, undirected graph, 
directed graph 


« Know how an adjacency matrix and an adjacency list may be used to represent a graph 


22 200 


e Be able to compare the use of adjacency matrices and adjacency lists 


Definition of a graph 


A graph is a set of vertices or nodes connected by edges or arcs. The edges may be one-way or two 
way. In an undirected graph, all edges are bidirectional. If the edges in a graph are all one-way, the 
graph is said to be a directed graph or digraph. 


Bury St Edmunds 


of Framlingham 
25 IN 
Stowmarket 9 


1 
, 15 
Ipswich Woodbridge 


Wickham Market 


Figure 38.1: An undirected graph with weighted edges 


The edges may be weighted to show there is a cost to go from one vertex to another as in Figure 38.1. 
The weights in this example represent distances between towns. A human driver can find their way 
frorn one town to another by following a map, but a computer needs to represent the information about 
distances and connections in a structured, numerical representation. 


Figure 38.2: A directed, unweighted graph 
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: Implementing a graph 
Two possible implementations of a graph are the adjacency matrix and the adjacency list. 


The adjacency matrix 


A two-dimensional array can be used to store information about a directed or undirected graph. 
Each of the rows and columns represents a node, and a value stored in the cell at the intersection 
of row i, column j indicates that there is an edge connecting node i and node j. 


In the case of an undirected graph, the adjacency matrix will be symmetric, with the same entry in (0,1) 
as in (1,0), for example. 


An unweighted graph may be represented with 1s instead of weights, in the relevant cells. 


Advantages and disadvantages of the adjacency matrix 
An adjacency matrix is very convenient to work with, and adding an edge is very simple. However, a 
sparse graph with many nodes but not many edges will leave most of the cells empty, and the larger 
the graph, the more memory space will be wasted. Another consideration is that using a static two- 
dimensional array, it is harder to add or delete nodes. 


The adjacency list 
An adjacency list is a more space-efficient way to implement a sparsely connected graph. A list of all the 
nodes is created, and each node points to a list of all the adjacent nodes to which it is directly linked. The 
adjacency list can be implemented as a list of dictionaries, with the key in each dictionary being the node 
and the value, the edge weight. 
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The unweighted graph in Figure 38.2 would be represented as shown below, with the adjacency list 
containing lists of nodes adjacent to each node. A dictionary data structure is not required here as there 
are no edge weights. 


The advantage of this implementation is that is uses much less memory to represent a sparsely 
connected graph. 


Traversing a graph 


There are two ways to traverse a graph so that every node is visited. 


e A depth-first traversal 


« A breadth-first traversal 


Depth-first traversal 


In this traversal, we go as far down one route as we can before backtracking and taking the next route. 


Consider the following graph: 


Figure 38.3 
Starting at A, we can either go left or right. We will choose to go left whenever there is a choice of routes. 


We visit C, F, J, H, D, G. We have already visited F so we have reached the end of this path. Back up to 
D and visit &. Now we must retrace our steps via D, H, J, F, C, to A, and go down the alternative route to 
Band K. 
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Nodes were visited in the orde ACFJHOGEBK. 


This sequence involved some choices, so is not unique. Another depth-first route would be 
ACDEHJFGBK. 


Breadth-first search 


With a breadth-first traversal, we visit all the neighbours of a node, and then all the neighbours of the 
first node visited, all the neighbours of the second node and so on, before moving further away from the 
start node. 


Consider the graph below: 


Figure 38.4 


Starting at A, we visit B, then C, then D (or we could have started by visiting C or D). 


Then we move to B, which has no neighbours, so we back up to A and go to C. From C, we visit E 
before returning to A. Next, we go to D and visit F. All nodes have now been visited, in the order AB C D 
EF, 


Applications of graphs 


Graphs may be used to represent, for example: 


* computer networks, with nodes representing computers and weighted edges representing the 
bandwidth between them 


* roads between towns, with edge weights representing distances, rail fares or journey times 
* tasks in a project, some of which have to be completed before others 


« web pages and links (see Google's PageRank algorithm in Section 5) 
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Exercises 


1. The figure below shows an adjacency matrix representation of a directed graph (digraph). 


(a) Draw a diagram of the directed graph, showing edge weights. [3] 
(b) Draw an adjacency list representing this graph. [3] : 


(c) Give one advantage of using an adjacency matrix to represent a graph, and one advantage 
of using an adjacency list. Explain the circumstances in which each is more appropriate. [4] : 


2. An undirected graph is shown below. 


[4] 
(b) List the nodes in the order in which they would be visited using 
(i) a depth-first search [3] : 
(i) a breadth-first search [3] 
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Chapter 39 — Trees 


Objectives 


@ + Define a binary tree as a rooted tree in which each node has at most two children 
@)° Create and traverse a binary tree 


A] * create, search and traverse a binary search tree 


Concept of a tree 


Trees are a very common data structure in many areas of computer science and other contexts. 
Like a tree in nature, a rooted tree has a root, branches and leaves, the difference being that a tree 
in computer science has its root at the top and its leaves at the bottom. 


Typical uses for rooted trees include: 


¢ manipulating hierarchical data, such as folder structures or moves in a game 
¢ making information easy to search (see binary tree search below) 
* manipulating sorted lists of data 


Generations of a family may be thought of as having a tree structure: 


@— Root node 
Branch or edge ~~, 


@— Leaf node 


The tree shown above has a root node, and is therefore defined as a rooted tree. Here are some terms 
used in connection with rooted trees: 


Node; The nodes contain the tree data 

Edge: An edge connects two nodes. Every node except the root is connected by exactly one 
edge from another node 

Root: This is the only node that has no incoming edges 

Child: The set of nodes that have incoming edges from the same node 

Parent: A node is a parent of all the nodes it connects to with outgoing edges 

Subtree: The set of nodes and edges comprised of a parent and all descendants of the parent. 


A subtree may also be a leaf 
Leaf node: A node that has no children 
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Note that a rooted tree is a special case of a connected graph. A node can only be connected to 
one parent node, and to its children. It is described as having has no cycles because there can be no 
connection between children, or between branches, for example from Ben to Anna or Petra to Kate. 


A binary search tree 


A binary tree is a rooted tree in which each node has a maximum of two children. A binary search 
tree holds items in such a way that the tree can be searched quickly and easily for a particular item, new 
items can be easily added, and the whole tree can be printed out in sequence. A binary search tree is a 
typical use of a rooted tree. 


Constructing a binary search tree 
Suppose the following list of numbers is to be inserted into a binary tree, in the order given, in such a way 
that the tree can be quickly searched. 


17, 8, 4, 12, 22, 19, 14, 5, 30, 25 
The tree is constructed using the following algorithm: 


Place the first item at the root. Then for each item in the list, visit the root, which becomes the current 
node, and branch left if the item is less than the value at the current node, and right if the item is greater 
than the value at the current node. Continue down the branch, applying the rule at each node visited, 
until a leaf node is reached. The item is then placed to the left or right of this node, depending on 
whether it is less than or greater than the value at that node. 


Following this algorithm, 17 is placed at the root. 8 is less than 17, so is placed at a new node to the left 
of the root. 


4 is less than 17, so we branch left at the root, branch left at 8, and place it to the left. 
12 is less than 17, so we branch left at the root, branch right at 8, and place it to the right. 


The final tree looks like this: 


To search the tree for the number 19, for example, we follow the same steps. 
19 is greater than 17, so branch right. 


19 is less than 22, so branch left. There it is! 
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| Traversing a binary tree 

; There are three ways of traversing a tree: 
e Pre-order traversal 
e In-order traversal 
¢ Post-order traversal 


The names refer to whether the root of each sub-tree is visited before, between or after both branches 
have been traversed. 
Pre-order traversal 


Draw an outline around the tree structure, starting to the left of the root. As you pass to the left of a node 
(where the red dot is marked), output the data in that node. 


The nodes will be visited in the sequence 17, 8, 4, 5, 12, 14, 22, 19, 30, 25 


A pre-order traversal may be used to produce prefix notation, used in functional programming languages. 
A simple illustration would be a function statement, x = sum a,b ratherthanx = a + b, in which 
the operation comes before the operands rather than between them, as in infix notation. 
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In-order traversal 


Draw an outline around the tree structure, starting to the left of the root. As you pass underneath a node 
(where the red dot is marked}, output the data in that node. 
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The nodes will be visited in the sequence 4, 5, 8, 12, 14, 17, 19, 22, 25, 30. 


The in-order traversal visits the nodes in sequential order. 


Post-order traversal 
Draw an outline around the tree structure, starting to the left of the root. As you pass to the right of a 


node (where the red dot is marked), output the data in that node. 
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The nodes will be visited in the sequence 5, 4, 14, 12, 8, 19, 25, 30, 22, 17. 
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: Implementation of a binary search tree 


A binary search tree can be implemented using an array of records, with each node consisting of: 


¢ left pointer 

e¢ data item 

« right pointer 

Alternatively, it could be held in a list of tuples, or three separate lists or arrays, one for each of the 
pointers and one for the data items. 


The numbers 17, 8, 4, 14, 22, 19, 12, 5, 30, 25 used to construct the tree above could be held as 
follows: 


tree[5] 
veal 
[vets | 9 | 0 | 4 | 


For example, the left pointer in tree[O] points to tree[1] and the right pointer points to tree[4]. The value -1 
is a ‘rogue value’ which indicates that there is no child on the relevant side (left or right). 


C Giratte 


D> <> 
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Tree traversal algorithms 


We have looked at three tree traversal algorithms: in-order, pre-order and post-order. The pseudocode 
algorithm for each of these traversals is recursive. If you have not yet covered recursion, you may like to 
leave the pseudocode algorithms to be covered at a later date. 


The algorithm for an in-order traversal is 


traverse the left subtree 
visit the root node 
traverse the right subtree 


Example of in-order traversal 


An algebraic expression is represented by the following binary tree. It could be represented in memory 
as, for example, three 1-dimensional arrays or as a list with each list element holding the data and left 


and right pointers to the left and right subtrees. The value of the root node is stored as the first element 
of the list. 


Figure 39, 7 
Suppose this data is held as shown below: 


treel4] 
treel5 
| tees) | + | a | a 


In pseudocode: 


procedure inorderTraverse (p) 


if tree[p].left != -1 then 
inorderTraverse (tree[p] .left) 

endif 

print (tree[p] .data) 

if tree[p].right != -1 then 
inorderTraverse (tree[p].right) 

endif 

endprocedure 


219 


SECTION 7 — DATA STRUCTURES 


A-Level only 


The routine is called with a statement inorderTraverse (0) 


Tracing through the algorithm, the nodes are output in the ordera*b+c/d 


Use of in-order traversal algorithm 
An in-order traversal is used with a binary search tree, to perform an efficient search for any item. 


Algorithm for post-order traversal 
The algorithm for a post-order traversal is 


traverse the left subtree 
traverse the right subtree 
visit the root node 


In pseudocode: 


procedure postorderTraverse(p) 


if tree[p].left != -1 then 
postorderTraverse (tree[p].left) 

endif 

if tree[p].right != -1 then 
postorderTraverse (tree[p].right) 

endif 

print (tree[p] .data) 

endprocedure 


The nodes are output in the sequence a b * cd / +. This is the sequence in which algebraic expressions 
are written using Reverse Polish Notation, which is used by compilers to evaluate expressions. 


Algorithm for pre-order traversal 
The algorithm for a pre-order traversal is 


visit the root node 
traverse the left subtree 
traverse the right subtree 


In pseudocode: 


procedure preorderTraverse (p) 
print (tree[p].data) 


if tree[p].left != -1 then 
preorderTraverse(tree[p].left) 
endif 
if tree[p].right != -1 then 
preorderTraverse(tree[p].right) 
endif 
endprocedure 


A pre-order traversal may be used for producing a prefix expression from an expression tree such as 
the one shown in Figure 39.1. Prefix is used in some compilers and calculators. 
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(a) Show how the following data may be stored as a binary tree for subsequent processing in 
alphabetic order by drawing the tree. Assume that the first itern is the root of the tree and 
the rest of the data items are inserted into the tree in the order given, 


Exercises 


1. Data may be stored as a binary tree. 


Data items: magpie, robin, chaffinch, linnet, thrush, blackbird, fieldfare, skylark, pigeon. [3] 
(6) Show how the data could be represented using three one-dimensional arrays. [3] 


(c) List the order that the nodes would be visited using 


(i) a pre-order traversal [2] 
(ii) an in-order traversal [2] 
(ili) a post-order traversal [2] 


2. In what order should the following tree be traversed so that each section and subsection is 
printed in the correct sequence? [1] 


fe | 
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Section 8 


Boolean algebra 


In this section: 
Chapter 40 Logic gates and truth tables 


Chapter 41 Simplifying Boolean expressions 
Chapter 42 Karnaugh maps 


Chapter 43 Adders and D-type flip-flops 
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Chapter 40 — Logic gates and truth tables 


Objectives 


*« Construct a truth table for a variety of logic gates 

« Be familiar with drawing and interpreting logic gate circuit diagrams involving multiple gates 
* Complete a truth table for a given logic gate circuit 

* Write a Boolean expression for a given logic gate circuit 

* Draw an equivalent logic gate circuit for a given Boolean expression 


* Define problems using Boolean logic 


Binary logic 


At the most elementary level, an electronic device can only recognise the presence or absence of current 
or voltage. Either electricity is present or it isn't. This is a switch — on or off, true or false, 1 or 0. With a 
computer's semiconductor, the voltage at the input and output terminals is measured and is either high 
or low; 1 or 0. Computers comprise billions of these switches and manipulating these sequences of ONs 
and OFFs can change individual bits. 


Electronic logic gates can take one or more inputs and produce a single output. This output can become 
the input to another gate and a complicated cascaded sequence of logic gates can be implemented to 
form a circuit in, for example, the CPU. 


Simple logic gates and truth tables 


There are a number of different logic gates that are each designed to perform a different operation in 
terms of output. We will look at NOT, AND, OR and XOR gates. 


Each of these gates may be represented by a truth table showing the output for each possible input or 
combination of inputs. The four gates are shown below. Inputs are usually given algebraic letters such as 
A, B and C and output is usually represented by P or Q. 


NOT gate (negation) 


The NOT gate is represented by the symbol below and inverts the input. The small circle denotes an 
inverted input. 


Using 1s and Os as inputs to a gate, its operation can summarised in the form of a truth table. 


» | So-0 


Q=NOTA 


The Boolean algebraic expression is written: Q = =A where - represents NOT. 
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AND gate (conjunction) 


Q=AANDB 
The Boolean expression for AND is written: Q = A a B where a represents AND. 


The truth table reflects the fundamental property of the AND gate: the output of A AND Bis 1 only if input 
A and input B are both 1. 


OR gate (disjunction) 


Q=AORB 


The Boolean expression for OR is written: Q = A v B where v represents OR. 


XOR gate (exclusive disjunction) 
The XOR (pronounced ex-or) gate stands for exclusive OR, meaning that the output will be true if one or 
other input is true, but not both. Compare this to the OR gate, which will accept two true inputs as true 
also. 


Input A [Input B | 
A Q a a 
B a a 
a ee ee ee 
a a a ca 


Q=AXORB 


The Boolean algebraic expression is written: Q = A ¥ B where the ¥ represents XOR, and is the 
equivalent of Q = (A a -B) v (-A a B). This gate is similar to the OR gate but excludes the condition 
where A and B are both true. XOR is referred to as exclusive OR, and OR is sometimes referred to as 
inclusive OR. 


Creating logic gate circuits 


Multiple logic gates can be connected to produce an output based on multiple inputs. 


A D 

Q 
B E 
C 


This circuit can be represented by the expression Q = 7A v (B a C) 
or alternatively as Q = (NOT A) OR (B AND C) 
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The equivalent truth table is shown below: 


| OutputQ@=DVvE 
Se 
Sex i 
a as 
SS ee 
Ee aaa 
—— a 
— 
SL a 


Q1: Draw a truth table for the following circuit: 
A 


Q2: Show, by drawing a truth table for P = (A « -B) v (-~A 4 B), that P = Q, where Q=AvB. 


Q3: Write the Boolean expression Q = - ((A v B) ~ C) using AND, OR, NOT, XOR instead of symbols. 
Draw the corresponding logic circuit. 


Q4: Write the Boolean expression represented by the logic diagram below, using AND, OR and NOT 
instead of symbols. Then write the same expression using symbols. What is the output if A, B 
and C are all True? 
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Defining problems using Boolean logic 
We can define problems in terms of Boolean logic. 


Example 1 


A boiler has two sensors, a pressure sensor and a temperature sensor. If either the temperature (T) or the 
pressure (P) is too high, a valve (V) will close. 


This can be expressed as V = T v P or alternatively as V=TORP 


The table representing these conditions could be drawn as follows: 


Temperature too high } 


Pressure too high 
Pressure not too high 


SR see Temperature not too high 


Example 2 


A chemical process has a sensor to detect a dangerous situation, in which case it sounds an alarm (A). 
The alarm is sounded if: 


either temperature >= 100°C AND rotator is OFF 
or PH > 6 AND temperature < 100°C 


A table can be drawn to represent these conditions as Boolean values. 


7 | tt |__Temperature >= 100°C 
pO | Temperature < 100°C 


[Rotator oN 
[0 *dY ator OFF 
= 
ee ee 


The conditions can be written as 
A =(T aR) v (P 4-77) or alternatively as A = (T AND NOT AR) OR (P AND NOT T) 


Now the logic circuit for this process can be drawn as follows: 
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Exercises 
1. (a) Complete the following truth table for the XOR logic gate. 


(6) Draw logic circuits for the following Boolean expressions: 


(i) Q=AvBv-B (3] 
(i) Q=-AnaBvc [3] 
(ii) Q=-(A v B) v (AAC) (3] 


2. The figure below shows a logic circuit. 


(a) Write the equivalent Boolean expression. [4] 
(bo) What are the values of F, G, H, K and Qif A, B, C and D and E are all equal 1? [5] 
3. Three sensors A, B and C are used to monitor a process. A signal X is output from the circuit. 
X% has the value 1 if either of the following conditions are met: 
Sensor A outputs 1 AND sensor B outputs 0 
Sensor B outputs 1 OR sensor C outputs 0 


Draw a logic circuit to represent these conditions. [5] 
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Chapter 41 — Simplifying Boolean expressions 


Objectives 


A) e Use the following rules to derive or simplify statements in Boolean algebra: 
o de Morgan's Laws 

commutation 

association 

distribution 

absorption 

double negation 


o 0 0 oO 0O 


e Write a Boolean expression for a given logic gate circuit, and vice versa 


A-Level only 
: de Morgan’s laws 


Augustus de Morgan (1806-1871) was a Cambridge Mathematics professor who formulated two 
theorems or laws relating to logic. These laws can be used to manipulate and simplify Boolean 
expressions. Although his theoretical work had little practical application in his lifetime, it became of 
major significance in the next century in the field of digital electronics, in which TRUE and FALSE can be 
replaced by ON and OFF or the binary numbers 0 and 1. 


Using de Morgan’s laws, any Boolean function can be converted to one which uses only NAND functions 
or only NOR functions, and these can be further converted to an expression using all NAND functions or 
all NOR functions. 


Thus, any integrated circuit can be built from just one type of logic gate. This is an advantage in 
manufacturing where costs can be kept down by using only one type of gate. 


de Morgan’s first law 
a(A v B) = 7~A A -=B 


The truth of this is clear from the Venn diagram on the right. Suppose we 
have a variable X defined by Xx 


X = 4(A v B) 


Looking at the Venn diagram, A v B is represented by the white area. Since 
X is not in A v B, it consists of all the grey area. This can be defined as 
everything not in A and not in B, i.e. 


X = aA A AB 


CHAPTER 41 — SIMPLIFYING BOOLEAN EXPRESSIONS 


de Morgan’s second law 
(A a B) = 7A v AB 
Again, looking at the Venn diagram on the right, if 
X = -(A a B) 


X cannot be in the white area, so must be in the red, orange, or grey 
areas. That is, X is either not in A, or not in B, or not in either. This is the 
definition of 


X=-7A v =B 


A-Level only 


To implement each of de Morgan's laws, follow the three steps: 


Complement both terms in the expression, e.g. A, B 
Change AND to OR and OR to AND 


Complement the result 


Rules of Boolean algebra 


In addition to de Morgan's laws, there are several identities or “rules” which will help you to simplify 
Boolean expressions. The most useful are listed below. 


General rules 

1. XaQ=0 
KXA1T=xK 
XAK=X 
AAWK=0 
AvQ=%X 
Avi=t 
AvrX=X 
8. KvaK=1 


~wPoar | 


Commutative rule 
9 KAY=H=VaAX 


1,XvV=V¥vx 
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A-Level only 
7 A lati 


ssociative rules 
1XAVAZ=aKaYaZ 


WXvY¥vAaasKvYjvZ 
' Distributive rules 
IBXaYvZ=KaYivKaZ) 
a. (XvVYa(WrvZ=KawWwyvyKagvYawyv (Yad 
Absorption rules 
15.X¥v (Ka Y)=X 
16. Xa (®v Y)=X 
: Double negation 
17.X=47X 
Example 1 
Use de Morgan's laws and the laws of Boolean algebra to simplify the following Boolean expression: 
Q =-(4(% a AY) a (AY v 72) 
Answer; Q =-7(-(K a AY) a 71Y a Z)) (using de Morgan's second law) 
= (KarwYyv YaZ (using de Morgan's first law) 
: Example 2 
Use de Morgan’s laws to simplify An B v -Av -B 


Answer: Put brackets between the parts of the expression separated by « (OR) 
(A « B) v (5A v 7B) 
= (AaB)v 7(Aa B) (using de Morgan's first law) 


= | (using Rule 8 above) 


: Example 3 
Use Boolean algebra to show that (Av B)a (Av C)=Av BaC 
Answer: (Av B)a(Avc) = (AaA) v (BaA)yv (AaC)v (BAC) 
A v(BaA)v AaC)v (BAC 
) 
) 


{ Distributive law) 
( ) 
i A v(AaB)v AaC)v (Bac 
( ) 
( ) 


( 
(since AA A=A 
( 


commutative law) 
= A VIAAC)v (Bac (Absorption Law) 


= A v¥(BaCc (Absorption Law) 
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Example 4 


A single output Q is produced from three inputs X, Y and Z. Qis 1 only if X and Y are 1, or Z is 1 
and Y is 0. 


Write the Boolean expression to represent this circuit. 
Answer: There are two separate logic gates involved here: X a Y, Z a 7. 
The output from these two gates are input to an OR gate. 
Q=(KaY)v (Zany) 
Represent this equation diagrammatically using a combination of AND, OR and NOT gates. 


Answer: 


< x 
D 


Example 5 
Write the Boolean expression corresponding to the following logic circuit. 


Answer: Av -(B a C) 


A 
B Q 
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Exercises 
1. (a) Write a Boolean expression for P in the logic circuit shown in Figure 1. 
(b) Write a Boolean expression for R. 


(c) Draw the truth table for the logic circuit. 


A P 
B 
R 
C Q 
Figure 1 


A-Level only 
2. Simplify the Boolean expressions below. 
() Av Ba (Av -B) 
(i) +(A 4 B) a (GA v B) a (=B v B) 


(b} Simplify the following Boolean expressions. 
(}) BalAv 7A) 
(ji) AABvB 
(ii) (Av -B) a (Av B) 


(c) Draw a logic circuit for the following Boolean expression: 


Q=AaBv aAAaC) 


3. (a) State the names of the logic gates represented by each of the truth tables below. 


CHAPTER 42 — KARNAUGH MAPS 


Chapter 42 — Karnaugh maps 


Objectives 


*« Simplify Boolean expressions using Karnaugh maps 


Introduction 
A Karnaugh map provides an alternative way of simplifying Boolean expressions which is often easier 
than using Boolean algebra for those involving up to three or four variables. It is similar to a truth table 
and allows us to easily detect groupings of expressions with common factors. 

The two-variable problem 


The figure below shows the correspondence between a truth table and a Karnaugh map. 


The values inside the squares are copied from the output column of the truth table, so there is one 
square in the Karnaugh map for every row in the truth table. Suppose we have the following truth table: 


For example, when A = 0 and B = O, the output is 0. When A = 1 and B = 1, the output is 1. 
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Example 1 
Use a Karnaugh map to simplify the expression Q =7~Aa -By AA -7ABy 7~AaB 


Group the expression into three sub-expressions separated by v. 
Q = (7A a 7B) v (Aa 7B) v (7A «a B) 


Draw a blank Karnaugh map and fill in a 1 for the first sub-expression -A a -B. Then insert a 1 for the 
second sub-expression A « 7B. Finally add a 1 for the sub-expression -A «a B 


Now make groupings of 1, 2, or 4 ones, which can be overlapping. Each grouping should be as large as 
possible — in this case, the two groupings each consist of two squares. 


The pink group represents NOT A, and the blue group represents NOT B. Therefore the whole expression 
represents Q = NOT A OR NOT B, or in alternative notation, -A v -B. 


This is the simplification of the expression Q= ~A a -By Aa -By ~AaB 


The three-variable problem 


With three variables, each column can represent a combination of two variables. 


Example 2 
Represent the expression =A v -B v Aa Ba 7C in a Karnaugh map, and hence simplify the expression. 
BC 
A 00 01 #11 #4210 
0 


Note: The order of terms along the top is not random: they are arranged so that each subsequent term 
reflects a change in only one variable. They are not in numerical sequence of 00, 01, 10, 11. 


The choice of whether to put A on its own, and group B and C together, or choose a different pair, and 
put for example C as the column heading and AB as the row heading, is not important, and will produce 
the same groupings. 
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First, divide the expression into sub-expressions, bracketing between the v (OR) symbols, giving 
(=A) v (AB) v (Aa Ba -C) 


As before, we can now start filling in the table one step at a time, representing each sub-expression in turn. 


(+A) (=A) v (-B) 


Notice that the green group has “wrapped around" and is counted as one group representing =C. 
These three groups together represent =A v -=B v -=C. 


This is the simplification of the expression. 


Example 3 
Use a Karnaugh map to simplify the expression (7 Aa B) v (BarC)v(BaC)v(Aa7Ba7C) 
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Here, the group outlined in green “wraps around” but is still a single group. The expression simplifies to 
By (Aa 7C) 


The four-variable problem 


With four variables, each row or column represents a combination of two variables. 


Example 4 
Represent the expression A v (A. 7B aC «a D) ina Karnaugh map, and hence simplify the expression. 


This simplifies to A. 


Summary of the Karnaugh map method 


1. Construct the Karnaugh map step by step, placing 1s in the squares for each sub-expression 
separated by an OR symbol (v) 


2. Group any octet (8 squares) 


3. Group any quad (4 squares that have not already been grouped, making sure to use the minimum 
number of groups 


4. Group any pair which contains a 1 adjacent to only one other 1 which is not already in a group 
Group any isolated 1s which are not adjacent to any other 1s. 


Form the OR sum of all the terms generated by each group. 


236 


CHAPTER 42 — KARNAUGH MAPS 


Exercises 


1. A Karnaugh map is shown below. 


What Boolean expression does this map show? [1] 


2. A Karnaugh map is shown below. 


What Boolean expression does this map show? [2] 


3. Use Karnaugh maps in the format given below to simplify the following expressions: 


(a) (BaC)v(AaC)v (AaB) v (Aa -Ba-C) [3] 
(ob) (7A 4 B) v (Ba AC) v (BaC) [3] 
BC 
A 00 O1 11 «10 
0 


4, Use aKarnaugh map to simplify the following expression. 
(AFAaA7BaCaD v(AaBaCaD v(AnaBaCaD)v(Aa-BaCaD) 
VAaBarCa-Dv(AnaBa-7CaD)v (AaBaCa-—bD) [4] 
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A-Level only 


Chapter 43 — Adders and D-type flip-flops 


Objectives 


(A) e Recognise and trace the logic of the circuits of a half adder and a full adder 
Q * Construct the circuit for a flip-flop 
A) e Be familiar with the use of the edge-triggered D-type flip-flop as a memory unit 


Performing calculations using gates 


With the right combination of gates, it is possible to output the result of a binary addition or subtraction 
including the value of any carry bit as a second output. 


Half adders 


A half adder can take an input of two bits and give a two-bit output as the correct result of an addition of 
the two inputs. 


mo 


This is shown by the diagram above and represented by the truth table where S represents the sum and 
C represents the carry bit. S can be given as S = Av B, and C as C = Aa B. Although a flip-flop can 
output the value of a carry bit, it only has two inputs so it cannot use the carry from a previous addition 
as a third input to a subsequent addition in order to add n-bit numbers. 


Full adders 


A full adder combines two half adders to add three bits together including the two inputs A and B, and 
a carry bit C. The logic gate circuit below illustrates how two half adders have been connected with an 
additional OR gate to output the carry bit. 


Cou 


: Now the Boolean logic becomes S = AY B ¥ Cin, and Cou: = (A a B) v (Cin va (A ¥ B)). 
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Concatenating full adders 


Multiple full adders can be connected together. Using this construct, n full adders can be connected 
together in order to input the carry bit into a subsequent adder along with two new inputs to create a 
concatenated adder capable of adding a binary number of rn bits. 


The four-bit adder is an example of a standard component that can be used in many applications 
involving arithmetic operations. 


0°: 4° 1° 

Oe On 14 1% 3 
+ 1 O18 
= i 1 1 O% 14 

As Bs Az Bz A: B: Ao Bo 


D-type flip-flops 


A flip flop is an elemental sequential logic circuit that can store one bit and flip between two states, 0 
and 1. It has two inputs, a control input labelled D and a clock signal. 


The clock or oscillator is another type of sequential circuit that changes state at regular time intervals. 
Clocks are needed to synchronise the change of state of flip flop circuits. 


Clock period 
sameiaiieniaeibene 


Falling edge 


+" 


Rising 
width edge 
The D-type flip-flop (D stands for Data or Delay) is a positive edge-triggered flip-flop, meaning that it 
can only change the output value from 1 to 0 or vice versa when the clock is at a rising or positive edge, 
i.e, at the beginning of a clock period. 


Data input D D-Type Output Q 


Flip-flop 


Clock signal Output Q 


When the clock is not at a positive edge, the input value is held and does not change. The flip-flop 
circuit is important because it can be used as a memory cell to store the state of a bit. 
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Output @ only takes on a new value if the value at D has changed at the point of a clock pulse. 


This means that the clock pulse will freeze or ‘store’ the input value at D until the next clock pulse. 
If D remains the same on the next clock pulse, the flip-flop will hold the same value. 


: The use of a D type flip flop as a memory unit 


A flip flop comprises several NAND (or AND and OR) gates and is effectively 1-bit memory. To store eight 
bits, eight flip-flops are required. Register memories are constructed by connecting a series of flip-flops 
in a row and are typically used for the intermediate storage needed during arithmetic operations. Static 
RAM is also created using D-type flip-flops. Imagine trying to assemble 16GB of memory in this way! 


The graph below illustrates how the output Q only changes to match the input D in response to the rising 
edge on the clock signal. Q therefore delays, or ‘stores' the value of D by up to one clock cycle. 
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A-Level only 


Exercises 
1. A half-adder is used to find the sum of the addition of two binary digits. 
(a) Complete the diagram below to construct a half adder circuit. [3] 
A s 
B 
Cc 


(6) Complete the following truth table for a half adder’s outputs S and C. 


[2] 
(c) How does a full adder differ from a half adder in terms of its inputs? [2] 
2. An edge-triggered D-type flip-flop can be used as a memory cell to store the value of a single bit. 
The following graph shows the clock cycle and the input signals applied to D. 
(a) Label each rising edge on the diagram below. [1] 
(6) Draw the flip-flop’s output Q on the graph. [4] 
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Legal, moral, ethical and cultural issues 


In this section: 


Chapter 44 Computing related legislation 243 
Chapter 45 Ethical, moral and cultural issues 249 
Chapter 46 Privacy and censorship 255 
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Chapter 44 — Computing related legislation 


Objectives 
* Be aware of computing related legislation, including: 
o The Data Protection Act 1998 
o The Computer Misuse Act 1990 
o The Copyright Design and Patents Act 1988 
o The Regulation of Investigatory Powers Act 2000 


* Understand that developments in digital technologies have enabled massive transformations in the 
capacity of organisations to monitor behaviour, amass and analyse personal information 


Introduction 


The rapidly changing field of computing and worldwide communications poses particular challenges 
to legislators. 


Countries have different laws, and it is sometimes hard to prove in which country an offence was 
committed, and equally hard to trace the offender or to prosecute. 


New applications in computing are constantly being invented and with them, new ways of committing 
offences for which there is no legislation. Legislators have to balance the rights of the individual with 
the need for security and protection from terrorist or criminal activity. Many countries, for example, have 
enacted legislation restricting or banning the use of strong cryptography. 


Computing related legislation 


Legislation relating to privacy can be broadly categorised into laws intended to protect personal 
privacy and those which have been passed in the interests of national security, crime detection or 
counter-terrorism. 


Some laws relate specifically to computing, for example: 


e the Data Protection Act (1998) which is designed to ensure that personal data is kept accurate, 
up-to-date, safe and secure and not used in ways which would harm individuals 


* the Computer Misuse Act, which makes it an offence to access or modify computer material 
without permission 


e¢ The Regulation of Investigatory Powers Act 2000 


Other laws such as the Copyright, Designs and Patents Act (1988) have a more general application, 
covering the intellectual property rights of many types of work including books, music, art, computer 
programs and other original works. 
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The Data Protection Act 1998 


The Data Protection Act says that anyone who stores personal details must keep them secure. 
Companies with computer systems that store any personal data must have processes and security 
mechanisms designed into the system to meet this requirement. 


e The act includes a number of principles: 

¢ data must be processed fairly and lawfully 

e data must be adequate, relevant and not excessive 

* data must be accurate and up to date 

¢ data must not be retained for longer than necessary 

¢ data can only be used for the purpose for which it was collected 
e data must be kept secure 


¢ data must be handled in accordance with people's rights 
¢ data must not be transferred outside the EU without adequate protection 


All data users must register with the Data Commissioner. 


The Computer Misuse Act 1990 


The Computer Misuse Act has three main principles, primarily designed to prevent unauthorised access 
or ‘hacking’ of programs or data. 


The Computer Misuse Act (1990) recognised the following new offences: 


e Unauthorised access to computer material 
e Unauthorised access with intent to commit or facilitate a crime 


¢ Unauthorised modification of computer material 
e Making, supplying or obtaining anything which can be used in computer misuse offences 


The Copyright Designs and Patents Act 1988 


This Act is designed to protect the creators of books, music, video and software from having their work 
illegally copied. 


The Act makes it illegal to use, copy or distribute commercially available software 
without buying the appropriate licence. When a computer system is designed and 
implemented, licensing must be considered in terms of which software should be 
used, If you use commercial software called for example TestSoft to create a series 
of multiple choice tests called ReviseHistory, it may not be permissible to sell your 
finished product without paying TestSoft a fee for every copy you sell. 


Similarly, if your school buys a copy of ReviseHistory, they may not be permitted to install it on more than 
one computer without buying a multi-user licence for a certain number of users. 
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If you buy a music CD or pay to download a piece of music, software or a video, it is illegal to 


* pass a copy to a friend 
* make a copy and then sell it 
* use the software on a network, unless the licence allows it 


The software industry can take some steps to prevent illegal copying of software: 


« The user must enter a unique key before the software is installed 
e Some software will only run if the CD is present in the drive 


* Some applications will only run if a special piece of hardware called a ‘dongle’ is plugged into a USB 
port on the computer 


However, although a piece of software such as an applications package, game or operating systern is 
protected, algorithms are not eligible for protection. If you come up with a much better sorting algorithm 
than anyone else, for example, you cannot stop others from using it. 


The Regulation of Investigatory Powers Act 2000 


This Act regulates the powers of public bodies to carry out surveillance and investigation, and covers 
the interception of communications. It was introduced to take account of the growth of technology, the 
Internet and strong encryption, and additions have been made regularly between 2003 and 2010, with 
the latest draft bill out before Parliament in November 2015. 


The Act: 


* enables certain public bodies to demand that an ISP provide access to a customer's communications 
in secret 


* enables mass surveillance of communications in transit 

e enables certain public bodies to demand ISPs fit equipment to facilitate surveillance 

e enables certain public bodies to demand that someone hand over keys to protected information 
e allows certain public bodies to monitor people's Internet activities 


* prevents the existence of interception warrants and any data collected with them from being revealed 
in court 


Analysing personal information 


According to the head of a 2006 Royal Academy study into surveillance, Google is within a few years of 
having sufficient information to be able to track the exact movements and intentions of every individual, 
via Google Earth and other software they are developing. 


lt is predicted that small computers will become embedded in everything from clothes to beermats. 
Consequently, we will be interfacing with computers in everything we do, fram meeting chip-wearing 
strangers to entering smart buildings or sitting on a smart sofa, and each of these interfaces will end 
up on a Google database. 


It is a vision of a world without privacy. 


Already, Google collects and stores data about millions of emails every day. Here are sorne extracts 
from the information they post on their website, which users must agree to if they wish to use 
Google software. 
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To be consistent with data protection laws, we're asking you to take a moment 
to review key points of Google's Privacy Policy. This isn't about a change that 
we've made - it's just a chance to review some key points. 


Data we process when you use Google 


«When you search for a restaurant on Google Maps or watch a video on YouTube, for 
example, we process information about the activity - including information like the video 
you watched, device IDs, IP addresses, cookie data and location. 


« We also process the kind of information described above when you use apps or sites 
that use Google services like ads, Analytics and the YouTube video player. 


Why we process it 


We process this data for the purposes described in our policy, including to: 


* Help our services deliver more useful, customised content such as more relevant 
search results; 


* Improve the quality of our services and develop new ones; 


* Deliver ads based on your interests, including things like searches you've done or 
videos you've watched on YouTube; 


* Improve security by protecting against fraud and abuse; and 


« Conduct analytics and measurement to understand how our services are used. 


Organisations, including governments and security agencies, 
collect huge amounts of data about private citizens, often 
supplied by Internet companies such as Google, as well as by 
telephone companies. 


With the aim of detecting terrorist or other illegal activities, the 
US Government collects, stores and monitors metadata about 
all electronic communications in the US. Metadata includes 
information such as the telephone number called, date, time and 
duration of call. 


In one month in 2013, the unit collected data on more than 97 
billion emails and 124 billion phone calls from around the world. 
Edward Snowden is a famous ‘whistle-blower’ who informed the 
world about these practices. 
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Case Study: Edward Snowden 


In April 2013, Guardian journalist Glenn Greenwald and Academy Award-winning documentary film 
director and producer Laura Poitras met in the Marriott Hotel in New York to discuss an initial contact 
with an anonymous “whistle-blower”. Seated in the hotel restaurant, Laura Poitras asked Glenn to either 
remove the battery from his cell phone or leave it in the hotel room. “It sounds paranoid,” she said, 

“but the government has the capability to activate cell phones and laptops remotely as eavesdropping 
devices. Turning off the phone or laptop does not defeat the capability; only removing the battery does.” 


The anonymous source had refused to email any details of what material he had to offer until Glenn 
installed PGP on his computer. PGP, which stands for “Pretty Good Privacy” is a sophisticated tool to 
prevent online communications from being hacked. The encryption codes are so lengthy and random 
that it would take years to decrypt a communication. But it was complicated to install and it took Glenn 
several months to get round to it, before he was eventually talked through the process online by his 
anonymous contact. Only then did he receive information from his source about a program called PRISM, 
which allowed America's National Security Agency (NSA) to collect private communications from the 
world's largest Internet companies, including Facebook, Google, Yanoo, Microsoft, Apple, YouTube, AOL 
and Skype. 


The first document that Glenn opened was a training manual to teach analysts about the new surveillance 
capabilities. It told analysts how they could query, for example, particular email addresses or telephone 
numbers and what data they would receive in response. 


What did the source hope to achieve by exposing the secret surveillance practices of the NSA? 


“| want to spark a worldwide debate about privacy, Internet freedom, and the dangers of state 
surveillance,” he stated. “I’m not afraid of what will happen to me. I’ve accepted that my life will be over 
from my doing this. I'm at peace with that. | know it’s the right thing to do.” 


The next step was for Glenn and Laura to travel to Hong Kong to meet the whistleblower — Edward 
Snowden, a 29-year-old who had worked since 2005 as a technical expert for the CIA, NSA and its 
sub-contractors, making around $200,000 in salary and bonuses. He had travelled to Hong Kong in May, 
staying in a hotel under his own name, figuring he was safer there than staying in the US when news of 
the leaked documents broke. 


“| watched NSA tracking peopie’s Internet activities as they typed. | became aware of just how invasive 
US surveillance capabilities had become. | realised the true breadth of this system. And almost nobody 
knew it was happening. 


“For many kids, the Internet is a means of self-actualisation. It allows them to explore who they are and 
who they want to be, but that works only if we're able to be private and anonymous, to make mistakes 
without them following us. | worry that mine will be the last generation to enjoy that freedom. | do not 
want to live in a world where we have no privacy and no freedom, where the unique value of the Internet 
is snuffed out.” 


On 6th June 2013, the first of many articles was published by the Guardian. 


NSA collecting phone records of millions of Verizon customers daily 


Exclusive: Top secret court order requiring Verizon to hand over all call data shows scale of domestic 
surveillance under President Obama. 


The order, signed by Judge Roger Vinson, compels Verizon to produce to the NSA electronic copies 
of “all call detail records or ‘telephony metadata’ created by Verizon for communications between the 
United States and abroad” or “wholly within the United States, including local telephone calls”. 
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As journalist Glenn Greenwald painstakingly sifted through the mountain of information provided by 
Snowden, he was shocked at the extent of the American surveillance operation. It included the NSA's 
tapping of Internet servers, satellites, underwater fibre-optic cables, local and foreign telephone systems 
and personal computers. A list of individuals targeted for particularly invasive forms of spying included 
terrorist and criminal suspects, democratically elected leaders of many countries in Europe including 
France and Germany, and ordinary American citizens. 


The documents leaked by Snowden revealed that the literal aim of the US Government was to collect, 
store, monitor and analyse metadata about all electronic communications by everybody in the world. 


Exercises 


1. Do you think Edward Snowden was right to reveal the secret documents to which he had 
access, being legally forbidden to do so under the US Espionage Act 1917? Justify your answer. [4] 


2. The FBI and NSA have been protesting about losing surveillance capabilities—through greater 
encryption of the Internet—since the 1990s. In China, the manufacture, use, sale, import, or 
export of any item containing encryption without prior government approval may lead to 
administrative fines, the seizure of equipment, confiscation of illegal gains, and even 
criminal prosecution. 


Give arguments for and against a policy of making it illegal for individuals and organisations to 
use strong encryption in their online communications. [4] 


3. The Data Protection Act 1998 sets out eight principles for the protection of privacy in data 
collection, handling and distribution. Name two of these principles and explain how each 


serves to protect privacy. [4] 
4, What Act provides intellectual property protection for software? What actions are illegal 
under this Act? [3] 
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Chapter 45 — Ethical, moral and cultural issues 


Objectives 


« Discuss the individual (moral), social (ethical) and cultural opportunities and risks of digital technology: 
© computers in the workforce 
© automated decision making 
0 artificial intelligence 
o environmental effects 
0 analysis of personal information 


e Understand the real and potential impact that digital technology has on employment, the distribution 
of wealth and the lives of millions of people 


« Discuss the environmental effects of computers 


The economic impact of the Internet 


The Internet has its origins in the 1960s with ARPANET, the first North American wide area network. In 
1974, two engineers called Bob Kahn and Vint Cerf devised a protocol for linking up individual networks 
into what they termed the Internet — the “internetworking of networks”. 


In the 1980s Tim Berners-Lee, working for CERN in Geneva, invented, or designed, the World Wide 
Web. He wrote his initial Web proposal in March 1989 and in 1990 built the first Web browser, called 
WorldWideWeb. His vision was that “all the bits of information in every computer in CERN, and on the 
planet, would be available to me and to anyone else. There would be a single global information space." 


Berners-Lee had little interest in money and gave away his technology for nothing, but one of the most 
significant consequences of his invention was a complete reshaping of the economy throughout the 
world. Has it created jobs, or simply created the “1% economy” in which the top Internet companies like 
Amazon, Google, Facebook, Instagram and others have accumulated huge wealth at the expense of 
thousands of workers? 


Amazon 


Amazon started as an online bookstore in 1994 but soon diversified into DVDs, software, video games, 
toys, furniture, clothes and thousands of other products. In 2013 the company turned over $75 billion 
in sales, and it now accounts for 65% of all digital purchases of book sales. As a consequence of their 
domination, in 2015 there were fewer than 1,000 independent bookstores in Britain, one third less than 
in 2005, Where a bookshop employs 47 people for every $10 million in sales, Amazon employs 14 to 
generate the same revenue. 


eBay 
eBay, essentially an electronic platform bringing together buyers and sellers of goods, grew from a user 
$227.9 billion in 2014. 


Google 


In 1996, Larry Page and Sergey Brin, two Stanford University Computer Science postgraduate students, 
created Google. There were already several successful search engines like Yahoo and AltaVista on the 
market, but Page and Brin came up with a game-changing algorithm, which they called PageRank, for 
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determining the relevance of a Web page based on the number and quality of its incoming links. The 
idea was that you could estimate the importance of a Web page by the number and status of other 
web pages that link to it. Every time you make a search, the Google search engine becomes more 
knowledgeable and thus more useful. Even more valuable to Google is the fact that Google learns more 
about you every time you search. 


By 1998, Google was getting 10,000 queries every day. By 1999, they were getting 70 million daily 
requests. Their next step was to figure out how to make money out of their free technology, and they 
came up with AdWords, which enabled advertisers to place keyword-associated ads down the right 
hand side of the page. The image below shows what comes up when a user in Dorchester searches 
for “Paintball”, with the nearest companies, sponsored advertisements and a map with their locations 
appearing on the right of the screen. 


By 2014, Google had joined Amazon as a winner-takes-all company, with 1.5 billion daily searches and 
revenues of $50 billion. 


Computers in the workforce 
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A 2013 paper by Carl Benedikt Frey and Michael Osborne entitled “The future of Employment: how 
susceptible are jobs to computerisation?” estimates that 47% of total US employment is at risk. They 
examine the impact of future computerisation on more than 700 individual occupations, and note the 
shifting of labour from middle-income manufacturing jobs to low-income service jobs which are less 
susceptible to computerisation. At the same time, with falling prices of computing, problem-solving 
skills are becoming relatively productive, explaining the substantial employment growth in occupations 
involving cognitive tasks where skilled, well-educated labour has a comparative advantage. 


Thus there is a polarization of labour, with growing employment in high-income cognitive jobs and low- 
income manual labour, and the disappearance of middle-income occupations. Driverless cars developed 
by Google are an example of how computerisation is no longer confined to routine manufacturing tasks. 
The possibility of drones delivering your parcels is no longer in the realms of science fiction. In the 10 jobs 
that have a 99% likelihood of being replaced by software and automation within the next 25 years, the 
authors include tax preparers, library assistants, clothing factory workers, and photographic 

process workers. 
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In fact, jobs in the photographic industry have already all but vanished. In 1989, when Tim Berners-Lee 
invented the World Wide Web, Kodak employed 145,000 people in research labs, offices and factories 
in Rochester US and had a market value of $31 billion. In 2013 the company filed for bankruptcy and 
Rochester became virtually a ghost town. 


Meanwhile, in 2010, a young entrepreneur called Kevin Systrom started up Instagram, which enabled 
users to create photos on their smartphones with filters to give them, for example, a warm, fuzzy glow. 


\\\ eae 
AY 


An Instagram moment 


Twenty-five thousand iPhone users downloaded the app when it launched on 6th October 2010. A 
month later, Systrom’s Instagram had a million members. By early 2012, it had 14 million users and by 
November, 100 million users, with the app hosting 5 billion photos. But when Systrom sold Instagram 
to Facebook for a billion dollars in 2012 (less than two years after the startup), Instagram still only had 
thirteen full-time employees working out of a small office in San Fransisco. It is a good example of a 
service that is not providing any jobs at all in the winner-takes-all economics of the digital marketplace. 


User generated content 


In his book “The Cult of the Amateur”, Andrew Keen argues that "MySpace and Facebook are creating a 
youth culture of digital narcissism; open-source knowledge-sharing sites like Wikipedia are undermining 
the authority of teachers in the classroom, the YouTube generation are more interested in self-expression 
than in learning about the outside world; the cacophony of anonymous blogs and user-generated content 
is deafening today’s user to the voices of informed experts and professional journalism; kids are so 

busy self-broadcasting themselves on social networks that they no longer consume the creative work of 
professional musicians, novelists, or filmmakers.” 


Keen asserts that a thriving music, video and publishing economy is being replaced by the multi-billion 
dollar monopolist YouTube. The traditional copyright-intensive industries accounted for almost 510 billion 
euros in the European Union during the period 2008-2010, and generated 3.2% of all jobs, amounting to 
more than 7 million jobs. What will happen if large numbers of these jobs disappear? 
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Algorithms and ethics 


Computer scientists and software engineers who devise the multitude of algorithms used by YouTube, 
Facebook, Amazon and Google, and by organisations from banks and Stock Exchanges to the Health 
Service and the police, have significant power and therefore the responsibility that goes with it. In some 
US cities, algorithms determine whether you are likely to be stopped and searched on the street. 

Banks use algorithms to decide whether to consider your application for a mortgage or a loan. 
Algorithms are applied to decision-making in hiring and firing, healthcare and advertising. It has been 
reported, for example, that some algorithms which decide what advertisements are shown on your 
browser screen classify web users into categories which include “probably bipolar", “daughter killed in 
car crash", “rape victim", and “gullible elderly”. Did the programmer who wrote that algorithm have 
any qualms about his work? 


When algorithms prioritise, they “bring attention to certain things at the expense of others”. 


Facebook’s ‘News Feed’ product filters posts, stories and activities undertaken by friends. Content for 
the Newsfeed is selected or omitted according to a ranking algorithm which Facebook, with its billion- 
plus user base, continually develops and tests to show users the content they will be most interested in. 
But it has been suggested that these social interactions may influence people's emotions and state of 
mind; the emotions expressed by friends via online social networks may influence our own moods and 
behaviour’, Clearly, then, those who devise the ranking algorithms potentially have the ability to influence 
the emotional state of people using Facebook. 


Should computer scientists consider the institutional goals of a prospective employer, or the social worth 
of what they do, before accepting a job? Phillip Rogway, Professor of Computer Science at the University 
of California, found that on a Google search of deciding among job offers, not one suggested that this 
was a factor "™!, 


Driverless cars 


The prospect of large numbers of self-driving cars on our roads raises ethical questions about the 
morality of automated decision making and different algorithms which could be used in the face of 
causing “unavoidable harm” - who gets harmed and who gets spared’. 


a ® b G 


(a) The car can stay on course and kill several pedestrians, or swerve and kill one passer-by 
(b) The car can stay on course and kill one pedestrian, or swerve and kill its passenger 
(c) The car can stay on course and kill several pedestrians, or swerve and kill its passenger 


The MIT Technology Review asked: “Should different decisions be made when children are on board, 
since they both have a longer time ahead of them than adults, and had less say in being in the car in the 
first place? If a manufacturer offers different versions of its moral algorithm, and a buyer knowingly chose 
one of them, is the buyer to blame for the harmful consequences of the algorithm’s decisions?” 
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One of the commonly held principles that form a commonly held set of pillars for moral life is the 
obligation not to inflict harm intentionally; in medical ethics, the physician's guiding principle is “Do no 
harm". Going further, the moral duties of all scientists, including computer scientists, should also include 
trying to promote the common good. 


Artificial intelligence 


As digital technologies are used in more and more areas of our lives, spreading into our offline 
environments through the so-called ‘Internet of things’, previously inert objects are expected to become 
networked and start making decisions for us. Algorithms will allow the refrigerator to decide what food 
needs replacing, a door will decide who to let in. Should your door call the police if the door is opened 
by someone without a tracking device? Should your house report a child who screams excessively to the 
Social Services? 


Environmental effects of computers 


Environmental issues include the carbon footprint and waste products that result from manufacturing 
computer systems, but this is often outweighed by the positive effects on the environment of using 
computerised systems to manage processes that might otherwise generate more pollution. 


Considerations may include: 


e Does a computer system mean that people can work from home and therefore drive less? 


* Has computer technology led to a “throw-away society”, with huge waste dumps of unwanted 
products which are thrown away rather than repaired or upgraded? 


« ls working at home more environmentally friendly than everyone working in a big office, in terms of 
heating and lighting? 


e Do computer-managed engines work more efficiently? Create less pollution and use less fuel? 


Computers and waste 


The pace of technology is so rapid that computers, mobile phones and handheld-devices that seemed 
so desirable a few short years ago are now discarded without a thought for the latest must-have piece of 
equipment. Are they recyclable or are they simply contributing to a huge mountain of waste, containing 
dangerous chemical elements which leach into water supplies in third world countries? 
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Exercises 


1. Some of the jobs likely to disappear over the next decade owing to computerisation include 
manufacturing jobs, clerical jobs and even service jobs, where people will be replaced by robots. 
Give examples of other jobs which may be lost owing to computerisation. What will be the social 
effects of the job losses? [7] 


2. Decisions are often made about us on the basis of algorithms of which we may be completely 
unaware. Car insurance premiums are calculated based largely on your age, experience, address, 
occupation and vehicle details. Health insurance premiums are affected by age, occupation, personal 
and parental medical histories. Are the algorithms that calculate these premiums fair? Discuss how 
the algorithms used embed moral and/or cultural values. State with reasons who benefits from the 
decisions made by these algorithms and whether anyone is harmed. [4] 


3. Computers have had a considerable impact on our environment. Describe an environmental 
problem to which the industry contributes and what measures individuals can take to help solve 
this problem. [4] 
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Chapter 46 — Privacy and censorship 


Objectives 


« Discuss the cultural opportunities and risks of digital technology relating to: 
© censorship and the Internet 
o the monitoring of behaviour 
0 piracy and offensive communications 
0 


layout, colour paradigms and character sets 


Trolls on the Internet 


Trolls, cyber-bullying and misogyny have become a fact of everyday life on the Internet. It wasn't 
supposed to be this way — the Internet was going to inspire a generation to voice a broad diversity of 
opinion and empower those who traditionally had no voice. 


After the 2010-11 Arab Spring, many people argued that the social media networks were helping to 
overthrow dictatorships and empower the people. But the Arab Spring deteriorated into vicious religious 
and ethnic civil wars, culminating in the rise of the so-called ISIS, which uses social networks to post 
atrocities and radicalise impressionable young people. 


Feminist writers and journalists, academics like Mary Beard and political campaigner Caroline Criado- 
Perez, who petitioned the Bank of England to create a bank note featuring Jane Austen's face, receive 
hundreds of death threats, rape threats and other offensive communications for no other reason than that 
they are women who have dared to appear on the media. Thousands of other women and teenage girls 
are victims of similar trolling on the Internet. Savage bullying on various social networking sites has led to 
several tragic cases of suicide. 


The Internet has brought great benefits, but all of us have a responsibility to use it wisely and well. 


Censorship and the Internet 


Internet censorship is the control or suppression of what can be accessed, viewed or published on the 
Internet. It may be carried out by governments or by private organisations in response to government 
regulators. Individuals and organisations may censor certain websites for moral, religious or business 
reasons, or from fear of intimidation or legal consequences. For example, websites containing copyright 
infringements, harassment or obscene material may be censored, 


The extent of censorship varies from country to country, and many of the issues associated with Internet 
censorship are similar to traditional censorship of newspapers, books, films, etc. It is more difficult to 
censor Internet information in one particular country, since the information can generally be found on 
websites hosted outside the country. In some countries such as North Korea and Cuba, the government 
has total control over all Internet-connected computers, and can therefore enforce censorship. 


Most people agree that there needs to be some form of censorship on the Internet; in a 2012 Internet 
Society survey, 71% of respondents agreed that “censorship should exist in some form on the Internet”. 
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Case study 1: Online abuse 


In 2016, comments on the Guardian newspaper website regularly exceeded 70,000 per day. Journalists 
put their names to all the articles they write, and regular columnists frequently have their photograph 
accompanying the column. One consequence of this is that gender and race appear to be key factors in 
attracting abuse. In a study of almost 70 million comments posted on the Guardian website, it was found 
that eight of the top ten Guardian Opinion writers most likely to attract abusive or off-topic comments 
below their articles were women, while the other two were black men. As well as the gender and race of 
the author, other factors appeared to be significant: one of the women was Jewish, one was Muslim and 
two were lesbian, while one of the two men was gay. 


Despite white men forming the majority of Guardian Opinion writers, the 10 columnists attracting the least 
abuse or off-topic comments were all men (nine of them white). 


One female journalist writing about a demonstration outside an abortion clinic was told “You are so ugly 
that if you got pregnant | would drive you to the abortion clinic myself”. A British Muslim wornan writing 
about Islamophobia was told to “marry an ISIS fighter and then see how you like that!” 


As one journalist said, “Even if | tell myself the abuse doesn't mean anything, it has a toll on me. It has an 
emotional effect, it takes a physical toll. And over time, it builds up.” Another said “Imagine going to work 
every day and walking through a gauntlet of 100 people saying “You're stupid", “You're terrible”, “| can't 
believe you get paid for this”. 


In April 2016 it was reported that Google, Facebook and Twitter were talking to organisations around 
the world to organise a global counter-speech movement against the violent misogyny, racism, threats, 
intimidation and abuse that flood social media platforms. 


Case study 2: Monitoring content on the Guardian website 


Almost every website, whether it be a newspaper or personal blog, has struggled with comments. 
A really good comment “informs its readers, corrects authors and provides worthwhile insights in a polite 
and constructive manner”. Other comments fall into the category of rants, bile, insults and trolling. 


The majority of comments are civil and productive, and engaging with comments is part of a journalists’ 
work, Many factors affect the success of commenting at news sites: topic, user anonymity, scale, site 
culture, moderation, journalists’ engagement and attitudes, and management support. 


Newspapers such as The Guardian employ a team of moderators to read comments and block or delete 
offensive ones. One moderator in April 2016 described how over the past five years he has read millions 
of comments and blocked tens of thousands. Moderation is about not letting anyone’s agenda ruin the 
conversation or ranting about irrelevant issues, as well as blocking trolls. 


An irony of successful discussion forums is that their success begets their failure. They get too big and 
attract spammers, scammers and trolls. 
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Monitoring behaviour 


We are all used to our movements and behaviour being caught on camera, in town 
and on the roads. CCTV cameras are used for security purposes, crime prevention 
and detection. They are used to record drivers speeding, turning or parking illegally or — 
driving the wrong way up a one-way street. 


Employers may monitor employee behaviour on the Internet, recording what sites are 
visited during working hours and how much time is spent on them. 


And, of course, you can use wearable technology to monitor your own behaviour — how many 
steps you have taken during the day, your heart rate during a run, the time you took to swim 100 metres. 


Layout, colour paradigms and character sets 


Websites designed in one country are viewable all over the world, so if they are intended for an 
international viewership, it is a good idea to give consideration to layout, colour and character sets. 


Layout 


Most websites are designed based on the US layout containing a linear structure of information with 
multiple blocks of text that a western reader is likely to skim over. With Japanese websites, for example, 
the preference is to include less information per page which, as a whole, is easier to absorb without fear 
of missing something. In the West, where text is read from left to right, menus are commonly placed on 
the left. In other countries, where Arabic script, for example, is read from right to left, menus and other 
page features might more logically appear mirrored in comparison with western versions of the same page. 


Maps are a good example of the use of cultural or nationalistic bias reflected in layout. A world map is 
frequently shown with the country where it was created appearing in the centre. 


Map1: The Americas in the centre 


Colour paradigms 


Around the world, the way that different cultures see and describe colours varies dramatically. In general, 
blue is considered the safest colour choice around the world, since it has many positive associations. 

In North America and Europe, blue represents trust, security, and authority, and is considered to be 
soothing and peaceful. However, it can also represent depression, loneliness, and sadness (hence having 
“the blues”). 


In Western cultures, green represents luck, nature, freshness, spring, environmental awareness, wealth, 
inexperience, and jealousy (the “green-eyed monster”). In Indonesia, green has traditionally been 
forbidden, whereas in Mexico, it’s a national colour that stands for independence. In the Middle East, 
green represents fertility, luck, and wealth, and it's considered the traditional colour of Islam. In Eastern 
cultures, green symbolizes youth, fertility, and new life, but it can also mean infidelity. In fact, in China, 
green hats for men are taboo because it signals that their wives have committed adultery! 
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In Western cultures, orange represents autumn, harvest, warmth, sunshine. In Hinduism, saffron (a 
soft orange colour) is considered auspicious and sacred. In Eastern cultures, orange symbolizes love, 
happiness, humility, and good health. 


Look up htto://www.shutterstock.com/blog/the-spectrum-of-symbolism-color-meanings-around-the- 
world to see the symbolism of other colours in countries around the world. 


Character sets 
A character set is the mapping of a collection of characters to specific bit sequences or codes. The 
collection can increase in number dependent on the maximum number of bits allocated to each 
character. ASCII uses only seven bits allowing for 128 characters whereas Unicode (UTF-16) has been 
developed to represent over a million characters including those of most languages, and symbols used in 
mathematical, scientific and musical notation. Unicode and ASCII is covered in more detail in Chapter 29. 


Exercises 


1. Networking sites frequently feature angry, violent or inaccurate content. 


Should Facebook, Twitter, Ask.com and others take responsibility for content posted on their 
sites? What sort of content should be allowed? Would it be possible to develop software to 
facilitate such a task? Discuss. [5] 


2. “Honest and law-abiding citizens have nothing to fear from the distribution of their personal data.” 
Do you agree with this statement? Give reasons for your view and a reason why someone else 
might not agree with you. [5] 


3. A University is debating whether to offer a course on writing malware such as viruses, worms 
and Trojan horses. Discuss the ethical issues involved in this decision, and whether or not you 
think they should run the course. [5] 
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Chapter 47 — Thinking abstractly 


Objectives 


e Understand the nature of and need for abstraction 
e Describe the differences between an abstraction and reality 


e Devise an abstract model for a variety of situations 


Computational thinking 


What is computational thinking? It is not about following an algorithm in one’s head to carry out a 
mathematical task like adding ten numbers. Rather, it is about thinking how a problem can be solved. 
This involves two basic steps: 


e Formulate the problem as a computational problem — in other words, state it in such a way that it is 
potentially solvable using an algorithm 


e Try to construct an algorithm to solve the problem 


A computational thinker will not be satisfied with any old algorithm, though; it must be a ‘good’ solution — 
that is, a correct and efficient solution. A programmer needs to be able to show that a solution is correct 
and efficient by using logical reasoning, test data and user feedback. 


Clearly, then, computational thinking is a vital skill for a programmer, and in fact it is not possible to be 
a programmer without it. It includes the ability to think logically and to apply the tools and techniques of 
computing to thinking about, understanding, formulating and solving problems. 


Computing has been called the automation of abstractions, so let's move on to talk about abstraction. 


Abstraction 


Representational abstraction can be defined as a representation arrived at by removing 
unnecessary details. 


Here are some examples of abstraction. 


e Any computer model, say of the environment, a new car or a flight simulator, is an abstraction. 


« Ifyou are planning to write a program for a game involving a bouncing ball, you will need to decide 
what properties of the ball to take into account. If it's bouncing vertically rather than, say, on a 
snooker table, gravity needs to be taken into account. How elastic is the ball? How far and in what 
direction will it bounce when it hits an edge? What you are required to do is build an abstract model 
of a real-world situation, which you can simplify; remembering, however, that the more you simplify, 
the less likely it becomes that the model will mimic reality. 


¢ A builder who is planning to build 100 houses on a new estate may use a physical model of the new 
estate, or in the first instance, a plan on paper or on a computer screen. In either case the model 
will be greatly simplified. All the houses may appear identical in the model. They may lack windows, 
doors or chimneys. All the trees in the model may be of identical size, colour and shape. 


¢ The map of the London Underground is a simple model of the actual geography of the Tube stations. 
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The map tells you what line each station is on and which other lines each station is connected to. It is 
very useful for a person travelling around London, but of very little use to an engineer who is planning 
where to dig tunnels for a proposed new line. 


All of these models contain different types of abstraction which are used in programming. In 
programming, abstraction is concerned with the distinction between what a program unit does and how 
it does it. 


Abstraction applied to high level programming languages 


Abstraction is the most important feature of high level programming languages such as Python, C#, Java 
and hundreds of other languages written for different purposes. To understand why, we need to look at 
different generations of programming language. 


« The first generation of language was machine code - programmers entered the binary Os and 1s 
that the computer understands. Writing a program to solve even a short, simple problem was a 
tedious, time-consuming task largely unrelated to the algorithm itself. 


*« The second generation was an improvement; mnemonic codes were used to represent instructions. 
But as you saw in Chapter 14, it is still an enormously complex task to write an assembly language 
program and what's more, if you want to run the program on a different type of computer, it has to be 
completely rewritten for the new hardware. 


¢ The third generation of languages, starting with BASIC and FORTRAN in the 1960s, used statements 
like X = A+ 5, freeing the programmer from all the tedious details of where the variables X and A were 
stored in memory, and all the other fiddly implementation details of exactly how the computer was 
going to carry out the instruction. 


Finally, programmers could focus on the problem in hand rather than worrying about irrelevant 
technological details, and that is a good example of what abstraction is all about. 
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Abstraction by generalisation 


There is a famous problem dating back more than 200 years to the old Prussian city of KOnigsberg. 
This beautiful city had seven bridges, and the inhabitants liked to stroll around the city on a Sunday 
afternoon, making sure to cross every bridge at least once. Nobody could figure out how to cross each 
bridge once and once only, or alternatively prove that this was impossible, and eventually the Mayor 
turned to the local mathematical genius Leonhard Euler. 


rd | Ighemarts= é "= & 


The map of 18th century Kénigsberg 


Euler's first step was to remove all irrelevant details from the map, and come up with an abstraction: 
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To really simplify it, Euler represented each piece of land as a circle and each bridge as a line between 
them. 


East 
island 
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What he now had was a graph, with nodes representing land masses and edges (lines connecting the 
nodes) representing the bridges. Now that Euler had his graph, how could he solve the problem? 


He did not want to try every possible solution; he realised that this was just a particular instance of 

a more general problem and he wanted to find a solution that was applicable to similar problems. 

He noticed a critical feature of the puzzle: since each bridge could be crossed only once, each node had 
to have an even number of connections, because you must enter and leave a node by a different edge. 
The only exceptions are the start and end node, since you don’t have to enter a start node or leave 

the end node. 


All the nodes in this graph have an odd number of edges, so it is therefore impossible! 
Euler had laid the foundation of graph theory, which you met in Chapter 38, with more in Chapter 63. 


By abstracting the problem, Euler made possible the solution of innumerable related problems. Not only 
does it apply to different cities with different numbers of bridges, it applies to many other problems with 
similar requirements. 


Abstraction by generalisation, as illustrated above, is a grouping by common characteristics to arrive at a 
hierarchical relationship of the “is a kind of” type. Thus Euler's problem is a particular instance of 
graph theory. 


This type of abstraction is very common in object-oriented programming. A class of object, say an 
Animal, will be defined with its own attributes such as gender and whether it is carnivore or vegetarian, 
and its own behaviours, methods or procedures such as move, sleep, eat, etc. Other objects such 

as Dog, Cat, Mouse and so on may be defined as subclasses of Animal - they all snare common 
characteristics which are defined in the Animal class, but have their own attrioutes and behaviours as 
well. In other words Dog “is a kind of” Animal, as are Cat and Mouse. 


Data abstraction 


A similar idea is that of data abstraction. 


The details of how data are actually represented are hidden. For example, when you use integers or real 
numbers in a program, you are not interested in how these numbers are actually represented in 
the computer. 


In a higher level language, it is possible to create abstract data types such as queues, stacks and 
trees. The abstract data type, for example a queue, is a logical description of how the data is viewed 
and the operations that can be performed on it. For example, elements can be added to the rear of the 
queue and removed from the front. The queue may have a maximum size that cannot be exceeded. 
The programmer using this data structure, however, is concerned only with the operations such as 
AddToQueue or RemoveFromQueue and does not need to know how the data structure is implemented 
using, for example, an array and pointers to the front and rear of the queue. 
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Exercises 


y I 


“Representational abstraction is a representation arrived at by removing unnecessary details.” 


Describe what this means in relation to a computer program which allows the user to enter a 
starting address A and a destination address B and returns a map of the route, the number of 
miles and the estimated journey time it will take to travel by car from A to B. [5] 


- Suggested routes 


m26 213 mi, 3 hours 43 mins 
oth current Waffc: } hours 47 mine 


Explain how abstraction could be used in a game program in which the player has to collect 
treasure in a cave and avoid being eaten by a monster. [5] 
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Chapter 48 — Thinking ahead 


Objectives 
* Identify the inputs and outputs for a given situation 
e¢ Determine the preconditions for devising a solution to a problem 
« Understand the need for reusable program components 


© « Understand the nature, benefits and drawbacks of caching 


Computational problems 
At its most abstract level, a computational problem can be represented by a simple diagram: 


Input Computational 
problem 


Input is the information relevant to the problem, which could for example be passed as parameters to a 
subroutine. 

Output is the solution to the problem, which could be passed back from a subroutine. 

A clear statement of exactly what the inputs and outputs of a problem are is a necessary first step in 
constructing a solution. 


Example 1: Determine whether a given item is present in a list 
On the face of it, this is a simple problem. But do we know exactly what the inputs are? For example, is 
the list sorted? Are the items numeric or alphabetic? What about the output — are we expecting it to be 
simply True or False, or should the output give the position in the list of the item if it is found? 


The problem needs to be formally defined, stating the inputs and outputs. This can be done as follows: 
Name: SearchList 


Inputs: A list of strings S = (s;, S2, S3 ...Sn) 
A target string t 


Outputs: A Boolean variable b 
Now we can write pseudocode for the function SearchList: 


function SearchList(s, t) 
found = False 


n= 0 
while found == False AND n < len(s) 
if t == s[n] then 
found = True 
else 
n=netl 
endif 
endwhile 
return found 
endfunction 
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Specifying preconditions 


Suppose that a pseudocode algorithm has been written to find the maximum of a list of numbers. 


function maxInt(listInt) 
maxNumber = listInt[0] 
for i= 1 to len(listInt) - l 
if listInt[i] > maxNumber then 
maxNumber = listInt[i] 
endif 
next i 
return maxNumber 
endfunction 


If the function is called with an empty list, it will crash on the statement 
maxNumber = listInt[0] 


In order to make sure the function never crashes, either the function must test for an empty list, or a 
precondition must be specified with the documentation for the function. 


Name: maxint 
Inputs: A list of integers listInt = (k;, Ka, Ka ...K,) 
Outputs: An integer maxint 


Precondition: length of listInt > 0 


Advantages of specifying preconditions 


¢ Specifying preconditions as part of the documentation of a subroutine ensures that the user knows 
what checks, if any, must be carried out before calling the subroutine. 


e lf there are no preconditions, then the user can be confident that necessary checks will be carried out 
in the subroutine itself, thus saving unnecessary coding. The shorter the program, the easier it will be 
to debug and maintain. 


e Clear documentation of inputs, outouts and preconditions helps to make the subroutine reusable. 
This means that it can be put into a library of subroutines and called from any program with access to 
that library. 


The need for reusable program components 


The Windows DLL (Dynamic Link Library) is an example of a package of reusable program components. 
Programming languages have libraries of functions to perform common functions, from printing to finding 
a square root to generating a random number. 
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In a large project, programmers may create their own libraries of reusable components. If, for example, 
abstract data structures such as queues, stacks or trees are used, routines to traverse, add to or delete 
from these data structures may be required in many different modules making up the whole project. 
Clearly, having components which have already been written, debugged and thoroughly tested will save 


time in completing the project. 
A-Level only 
Nature and benefits of caching 


Caching is another aspect of thinking ahead, this time done automatically by the operating system 
rather than the programmer. Caching is the temporary storage of program instructions or data that have 
been used once and may be needed again shortly. The last few instructions of a program may be stored 
in cache memory for quick retrieval. 

Web caching, i.e. the storing of HTML pages and images recently looked at, is another example of 


caching. This gives fast access to pages that have been recently looked at (and may be returned to) and 
saves having to download pages again, using up bandwidth unnecessarily. 


Exercises 
1. Explain the benefits of specifying inputs, outputs and preconditions in the documentation for 
a subroutine which will be saved in a library of subroutines for importing into many programs. [6] 


2. Give two examples of reusable program components in a programming language with which 
you are familiar. [2] 


A-Level only 


3. Explain what is meant by caching and give an example of when it is used in a computer system. [2] 
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Chapter 49 — Thinking procedurally 


Objectives 


¢ Identify the components of a problem 
e Identify the components of a solution to the problem 
* Determine the order of steps needed to solve a problem 


« identify sub-procedures necessary to solve a problem 


Procedural abstraction 


Computer science is, in broad terms, the study of problem-solving, and as such is also the study of 
abstraction. As we have seen, abstraction allows us to separate the physical reality of a problem from 
the logical view. Thus, for example, you can send an email, play music or download an image without 
knowing any of the detail of how these things are actually done. On the other hand, the computer 
engineers, technicians and system administrators who enable these things to happen have a very 
different view. They need to be able to contro! the low-level details that users are not even aware of, 


Procedural abstraction means using a procedure to carry out a sequence of steps for achieving some 
task such as calculating a student's grade from her marks in three exam papers, buying groceries online 
or drawing a house on a computer screen. 


Consider, for example, how you could code a program to create the plan for an estate of 100 new 
houses. You could use a procedure which will draw a triangle of certain dimensions and colour. 
The colour and dimensions are passed as arguments to the procedure, for example: 


procedure drawTriangle(colour, base, height) 
This procedure may be called using the statement 
drawTriangle("red", 4.5,2.0) 
The programmer does not need to know the details of how this procedure works. She simply needs to 


know how the procedure is called and what arguments are required, what data type each one is and 
what order they must be written in. This is called the procedure interface. 


Similarly, there may be a procedure to build a rectangle that is defined by parameters colour, height and 
width, which are passed as arguments: 


drawRectangle ("beige", 4.0, 5.0) 


To draw a house at a given position on the screen, the programmer may write a procedure buildHouse() 
which uses the drawTriangle() and drawRectangle(} procedures, aligns them and positions the house at a 
particular position on the screen. All these variables will be passed as arguments to the procedure. 


Several houses could be combined to make a street. Several streets could be drawn to represent 
the estate. 


Then, if the builder of the new estate decides to make all the houses larger, the procedure for drawing the 
house does not need to be changed - it is simply called with new arguments. 
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Procedure for 
drawing street 


Hides detail from Depends on 


Procedure for 
drawing house 


Hides detail from Depends on 


Procedures for 


drawing rectangle 
and triangle 


Problem decomposition 


Most computational problems beyond the trivial need to be broken down into sub-problems before they 
can be solved. Think of any system which starts off by presenting the user with a menu of choices. Each 
choice will result in a different, self-contained module. 


Top-down design 


Top-down design is the technique of breaking down a problem into the major tasks to be performed; 
each of these tasks is then further broken down into separate subtasks, and so on until each subtask 

is sufficiently simple to be written as a self-contained module or subroutine. Remember that some 
programs contain tens of thousands, or even millions, of lines of code, and a strategy for design is 
absolutely essential. Even for small programs, top-down design is a very useful method of breaking down 
the problem into small, manageable tasks. 


Advantages of problem decomposition 


As well as making the task of writing the program easier, breaking a large problem down in this way 
makes it very much simpler to test and maintain. When a change has to be made, if each module is self- 
contained and well documented with inputs, outputs and preconditions specified, it should be relatively 
easy to find the modules which need to be changed, knowing that this will not affect the rest of 

the program. 


Hierarchy charts 


A hierarchy chart is a tool for representing the structure of a program, showing how the modules relate to 
each other to form the complete solution. The chart is depicted as an upside-down tree structure, with 
modules being broken down further into smaller modules until each module is only a few lines of code 
(never more than a page). 
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Example 1 
Draw a hierarchy chart for a program which calculates and prints a customer's monthly gas bill. 


This can be broken down into several steps. 


Calculate 
gas bill 


Input Calculate Calculate 
meter reading units used total bill 


‘Calculate units used’ and ‘Calculate total bill’ may now be further broken down. 


Calculate 
gas bill 


Input Calculate Calculate 
meter reading units used total bill 


Add 
any outstanding 
amount owing 


Calculate 
cost of units used 


Get Calculate 
previous reading this month's plus standing 


units used charge 
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Exercises 


1. Using local rather than global variables in subroutines is one way of helping to make a program 
easy to maintain. 


(a) Explain why this is the case. [3] 


(0) Describe briefly three other ways in which a program can be made easy to understand 
and maintain. [6] 


2. Draw a hierarchy chart for a quiz program which does the following: 


* asks the user 10 random multiple-choice questions from a bank of 100 questions held in a file 
¢ if the user gives the correct answer, gives feedback and adds 1 to the user's score 
« if they give the wrong answer, gives feedback and displays the correct answer 


¢« at the end of the questions, gives the score out of 10 [6] 
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Chapter 50 — Thinking logically, thinking 
concurrently 


Objectives 


¢ Identify the points where a decision has to be taken 
¢ Determine the logical conditions that affect the outcome of a decision 
e Determine how decisions affect flow through a program 

© e Determine which parts of a program can be tackled at the same time 


@ * Determine the benefits and trade-offs of concurrent processing 


The structured approach 


The structured programming approach aims to improve the clarity and maintainability of programs. 
Using structured programming techniques, only three basic programming structures are used: 


* sequence — one statement following another 
e selection —-if... then... else... endif and switch/case ... endswitch statements 
e iteration — while ... endwhile, do... until and for ... next loops 


Languages such as Python and Pascal are block-structured languages which allow the use of just 
three control structures. They may allow you to break out of a loop, but this is not recommended in 
structured programming. Each block should have a single entry and exit point. 


Tools for designing algorithms 


Flow diagrams and pseudocode are two methods or tools which are commonly 
used for designing algorithms. Pseudocode corresponds more closely to the 
iteration structures in a programming language and is generally more useful for 
designing algorithms of any complexity. 


There are no universally accepted ways of writing pseudocode and so long as 
the meaning is clear, it is acceptable. OCR has its own standard way of writing 
pseudocode and that is used throughout this book and will be used in any exam 
questions involving pseudocode. Note that in this pseudocode the symbol == 
denotes equality in a condition. This is also how it is written in Python. 


Most of the logic errors that occur in programs occur at the points where 
decisions have to be made, or in the conditions which affect the outcome of a 
decision. This applies both to selection and iterative structures. 


The more algorithms you write, the more aware you will become of the places 
where errors are likely to occur. A useful strategy to test an algorithm is to draw a 
trace table and follow it through manually. 
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Example 1 


Consider the following algorithm. It is intended to print out the number of values between a lower and 
upper bound entered by the user, that are divisible by either 3, 5 or both. 


count = 0 
first = input("Please enter lower bound: ") 
last = input("Please enter upper bound: ") 


n = first 
while n <= last 
if n mod 5 == 0 then 
count = count + 1 
endif 


if n mod 3 == 0 then 
count = count + 1 

endif 

nm=enel 


endwhile 
print ("Values divisible by 3 or 5: ", count) 


There are two problems with this algorithm. The first is that it counts the value 0 as divisible by both 3 
and 5, whereas the user would probably not intend 0 to be included. We have not specified that the user 
should enter positive integers, and this should be specified as a pre-condition to the routine. 


The second problem is that any number divisible by both 3 and 5 will be counted twice. This is a logic 
error which needs to be corrected. 


Example 2 
Competitors playing in a chess tournament are awarded 2 points for a win, 1 point for a draw and 0 
points fora loss. Each player plays 12 games. 


The results for a player are held in an array of characters, with “W" representing a win, “D" representing a 
draw and “L" representing a loss. 


Write a pseudocode algorithm for a function which returns the points score of a player. Show how the 
function would be called and the result output. 
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function calculatePoints (score) 


points = 0 
for n = 0 to len(score) - 1 
if score[n] == "W" then 
points = points + 2 
else if score[n] == "D" then 
points = points + l 
endif 
next n 
return points 
endfunction 


// main program 

myscore = ["w"', oh oe Oy rw wy "hy Le ap Pe " Ee mW NTL" j 
result = calculatePoints (myscore) 

print ("Points scored: ", result) 


A second algorithm is written to provide the administrator of the tournament with further information 
about the players’ performance. The array names holds the name of each player in the tournament, and 
the array scores holds the corresponding points score for each player. 


The algorithm is shown below. 


function playerStats (names, scores) 
lowerCount = [] 
for j = 0 to len({names) - 1 
count = 0 
for k = 0 to len(names) - 1 
if scores[k] < scores[j] then 
count = count + 1 


endif 
next k 
lowerCount.append((names[j], count)) 
next J 
return lowerCount 
endfunction 
names = ["Adam", "Ben", "Carol", "Davina", "Enid", "Fred","George", 


"Henry", “Ian", "Jane", "Keith"] 
scores = [14, 3, 21, 14, 15, 106, 20, 6, 10, 12, 10) 
lowerCount = playerStats (names, scores) 
for n = 0 to len(names) 
print (lowerCount[n] [0], lowerCount[n] [1]) 
next n 
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In the above algorithm, the function playerStats returns a list of tuples called lowerCount. Each 
element of the tuple consist of a player's name and an integer count: 


((names[0], count[(0)), (names[{1), count[1]) .. (names[10), count[16]}) 
The first line output in the main program is 


o 


Adam 6 


A-Level only 


The difference between concurrent computing and parallel computing is debatable and is often taken to 
mean the same thing. For example, a house may have a burglar alarm system which continually monitors 
the front door, back door, windows, rooms upstairs and downstairs. 


Thinking concurrently 


Generally, concurrent computing is defined as being related to but distinct from parallel computing. 
Parallel computing requires multiple processors each executing different instructions simultaneously, 
with the goal of speeding up computations. It is impossible on a single processor. 


Concurrent processing, on the other hand, takes place when several processes are running, with 
each in turn being given a slice of processor time. This gives the appearance that several tasks are 
being performed simultaneously, even though only one processor is being used. Processor scheduling 
algorithms are covered in Section 2, Chapter 7. 


Benefits and trade-offs of concurrent processing 
Concurrent processing has benefits in many situations. 


* Increased program throughput — the number of tasks completed in a given time is increased 


*« Time that would be wasted by the processor waiting for the user to input data or look at output is 
used on another task 


*¢ The drawback is that If a large number of users are all trying to run programs, and some of these 
involve a lot of computation, these programs will take longer to complete 
Benefits and trade-offs of parallel processing 


e Parallel processors enable several tasks to be performed simultaneously by different processors. It 
can speed up processing enormously when repetitive calculations need to be performed on large 
amounts of data 


*« Graphics processors can quickly render a 3-D object by working simultaneously on individual 
components of the graphic 


*« A browser can display several web pages in separate windows and one processor may be carrying 
out a lengthy search or query while processing continues in other windows 


* parallel processing has limitations; there is an overhead in coordinating the processors and some 
tasks may run faster with a single processor than with multiple processors. 
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Exercises 


1. A plumber charges for parts and labour. Labour is charged at £20 per half hour or part of a half hour. 


The time spent is recorded as a four-digit integer, so that for example 
0120 means that 1 hour and 20 minutes labour is to be charged 
0350 means that 3 hours and 50 minutes labour is to be charged 
A variable called duration holds the four-digit integer representing time spent. 
(a) Write a subroutine to calculate and return the labour charge. 
(b} Identify two local variables used in your subroutine. 


(c) Show how the subroutine will be called using a parameter. 


2. In avote for which of three plays produced at a theatre was most enjoyable, the total votes cast 
for each of plays “A”, “B” and “C” have been stored in an array totalVotes. 


The following algorithm has been written to output the play with the most votes. 


O01 if totalVotes[0] > totalVotes[1]) then 
02 if totalVotes[0] > totalVotes[2] then 


03 print ("Play A") 

04 endif 

05 else 

06 if totalVotes[1] > totalVotes[2] then 
O7 print ("Play B") 

08 else 

09 print ("Play Cc") 

10 endif 

11 endif 


2S 


In the event that an equal number of votes is cast for each play, 
(i) which lines of the algorithm will be executed? 
(ii) what will be printed? 


(b 


— 


Write an algorithm so that the result is always printed correctly in the event of two or three 
plays all receiving the same number of votes. 


A-Level only 
3. (a) Distinguish between parallel processing and concurrent processing. 


(b} A school runs a local area network linking computers throughout the school. Describe how 
concurrent processing can be achieved on the network. 


(c} When a class of students all try and download a piece of software at the beginning of a class, 
performance is affected. Explain why. 
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Chapter 51 — Problem recognition : | 


Objectives 


TA) * Know what features of a problem make it soluble by computational methods 
© e« Categorise different types of problem and solutions 
@ + Explore different strategies for problem-solving 


fA) e Understand the concept and application of the “divide and conquer” approach 


Computable problems 


A problem is defined as being computable if there is an algorithm that can solve every instance of it ina 
finite number of steps. Some problems may be theoretically computable, but if they take millions of years 
to solve, they are, in a practical sense, insoluble. 


An example of such a problem is the cracking of a secure password. If you choose a password of 10 
characters or more, comprising a mixture of random letters, numbers and special symbols, it will be 
impossible to crack. You can test the strength of your passwords on various websites. 


Methods of problem solving 


There are many ways of problem solving, including: 


* enumeration (listing all cases) 
* simulation 
*® theoretical approach 


* creative solution 


Enumeration 
Theoretically, many problems and algorithmic puzzles can be solved by exhaustive search - trying 
all possible solutions until the correct one is found. Thousands of problems which were in the past 
insoluble have, thanks to the power of modern computers, become soluble. For example, a database of 
fingerprints or DNA can within a reasonable time find the identity of an individual, if his or her fingerprints 
or DNA are on the database. 


The most important limitation of the exhaustive search strategy is its inefficiency — in general, the number 
of possible solutions increases exponentially as the size of the problem increases. 
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C 


onsider, for example, the problem of constructing a magic square of order 3. The problem can be 
stated as follows: 


Fill the 3 x 3 square with the integers 1 to 9 in such a way that the sum of each row, column 
and corner-to-corner diagonal is the same. 


How many possibilities are there? There is a choice of 9 numbers for the first square, 8 for the second, 
and so ongiving9x8x7x6x5x4x3x2x 1 = 362, 880 ways of arranging the 9 numbers. This is 9! 
(spoken 9 factorial.) 


A magic square of 5 rows and columns has 25! solutions, and it would take a computer making 10 trillion 
operations per second about 49,000 years to try all the options. 


There are in fact algorithms which will find solutions for magic squares of any size. This is the theoretical 
approach, which will generally find results considerably faster than a “brute force” method of solution. 
Simulation 


Simulation is the process of designing a model of a real system in order to understand the behaviour of 
the system, and to evaluate various strategies for its operation. Such problems include: 


e financial risk analysis 

* population predictions 

* queueing problems 

* climate change predictions 

* engineering design problems 

Simulating a system invariably makes use of abstraction to reduce the problem to its essentials, 
removing all unnecessary details. Queueing problems, for example, include problems of finding out how 


many checkouts are needed in a new supermarket or on a new toll road, or how many staff are needed in 
a software support department to man the helplines, or in a tax office to process tax returns. 


Simulation can also involve building a physical model of, for example, a spacecraft, ship or wind turbine, 
so that its behaviour can be studied. This is obviously useful when it would be too expensive, dangerous 
or impractical to carry out tests on the real thing. A model can be used to evaluate performance or test 
outcomes. 
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Strategies for problem solving 


In Chapter 49 we looked at decomposition as a strategy for solving large, complex problems. Top-down 
design involves breaking a large task down into several smaller tasks, which are again broken down until 
each one is a small, manageable subtask. This is an excellent strategy for problem-solving. 


Divide and conquer 
This is a very powerful technique which essentially reduces the size of the problem with every iteration. Its 
best-known application is the binary search (see Chapter 60), which halves the size of the problem with 
each iteration. Other problems may be tackled in this way but do not necessarily reduce the problem so fast. 


Problem abstraction 


Problem abstraction involves removing details until the problem is represented in a way that it is possible 
to solve because it reduces to one that has already been solved. 


Consider the following problem: There are four knights on a 3x3 chessboard: the two white knights 

are at the bottom two corners, and the two black knights are at the two upper corners. The goal is to 
switch the knights in the minimum number of moves so that the white knights are in the upper corners 
and the black knights are in the bottom corners. (A knight can only move in the following manner: one 
or two squares horizontally or vertically, followed by two squares or one square at right angles, moving 3 
squares in total.) 


\We can abstract this problem by first numbering the squares of the chessboard. 1 to 9. Now we can 
draw lines from 1 to 6 and 1 to 8 representing the two possible moves from square 1. Do the same for 
each square in turn, and you end up with the graph shown in (b). (Square 5 can't be reached with a 
Knight's move so it is omitted from this graph.) 
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A-Level only 
F (6) is not much help in solving the problem. Now imagine that all the vertices are joined by a single 


igure 
string, and rearrange the string so that the vertices form a circle — this gives us a much more revealing 
picture. There are only two ways to solve the puzzle in the minimum number of moves; move the knights 
along the edges in either a clockwise or a counter-clockwise direction until each of the knights reaches 
the diagonally opposite corner for the first time. 


This is the “graph unfolding” method of solution, equivalent to a general problem that has already been 
solved in the same way, so is a reduction of the more general problem. 


Automation 


Automation in computer science deals with building and putting into action models to solve problems. 
For example, you could model the financial implications of running an ice-cream stand at a given venue 
for a week or a longer period. You have to decide on what has to be included in the model and what 
assumptions you are going to make. Then you have to create and implement the algorithms and execute 
and test the results. 


Physical world 
scenario or 
phenomenon 


Mathematical 
model 


Automaton 


automate 


Automating the abstraction may in fact tell us more about the reality that we are modelling. 


Oo 
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A computer game is being designed to simulate cars on a race track. Abstraction has been used in 
the design. 


Explain how abstraction may be applied in the creation of the game. [3] 


The goal in this problem is to place as many coins as possible at points of the 8-pointed star 
depicted below, according to the following rules: 
° Each coin must first be placed on an unoccupied point and then moved along a line to an 
unoccupied point 
° Once a coin has been positioned, it cannot be moved again. 
8 1 
7 2 
6 3 
5 4 
For example, you could make the following sequence of moves: 1394,2395,3796,732,873 
which places 5 coins. 
What is the maximum number of coins that can be placed? [2] 


Tip: Use the “graph unfolding” method of solution explained on the previous page. 
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Chapter 52 Problem solving 


Objectives 

A) * Learn about and apply the following to solve problems: 
oO visualisation 

backtracking 

data mining 

heuristics 


performance modelling 


o0o o oOo Oo G8 


pipelining 


Visualisation 


The manner in which a problem is presented is often a very important factor in finding a solution. 
Computers work with binary numbers but humans often prefer a visual image. Consider this 
representation of a binary tree: 


Who is the parent of Harriet? Who are the children of Tara? 


It is quite difficult to work out. But if we look at the tree in its graphical form, it becomes very simple. 
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A flow diagram is a useful way of visualising an algorithm. 


Backtracking 


In some problems, in order to find a solution you have to make a series of decisions, but there may be 
cases for which: 


* you don't have enough information to know which is the best choice 
*e each decision leads to a new set of choices 
* one or more of the sequences may be a solution to the problem 


Backtracking is a methodical way of trying out different sequences until you find one that leads to a 
solution. Solving a maze is a typical problem of this kind, and it is the technique used in a depth-first 
traversal of a graph, covered in Chapter 63. 
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- Data mining 
Data mining is the process of digging through big data sets to discover hidden connections and predict 


future trends, typically involving the use of different kinds of software packages such as analytics tools. 
Big data is the term used for large sets of data that cannot be easily handled in a traditional database. 


Big Data analysis is quite probably going to be the most exciting, interesting and useful field of study in 
the computing world over the next decade or two. We are just at the beginning of exploring its massive 
benefits in healthcare and medicine, business, communication, speech recognition, banking, and many 
other fields. Here are some questions it can answer: 


e Does cellphone use increase the likelinood of cancer? With six billion cellphones in the world, there is 
plenty of data to analyse. (The answer turned out to be “No"!) 


« How can you improve voice-translation software? By scoring the probability that a given digitised 
snippet of voice corresponds to a specific word. Google has made use of this data in its speech 
recognition software. 


¢ How does the Bank of England find out whether house prices are rising or falling? By analysing 
search queries related to property. 


¢ How can online education programmers use data collection to improve the courses offered’? 
By studying data on the percentage of thousands of students registered who rewatched a segment 
of the course, suggesting it was not clear, or collecting data on wrong answers to assignments. 


The term “Big Data” was first coined in the early 2000s by scientists working in fields such as astronomy 
and human genome projects, where the amount of data they were collecting was so massive that 
traditional methods of organising and analysing data, such as relational databases, could no longer 

be used. 


Intractable problems 


Some problems are termed intractable because although an algorithm may exist for their solution, it 
would take an unreasonably long time to find the solution. An example of such a problem is known as 
the Travelling Salesman Problem (TSP), which poses the question “Given a list of towns and the 
distances between each pair of towns, what is the shortest possible route that the salesman can use to 
visit each town exactly once and return to the starting point?” 


This is different from finding the shortest path from A to B. This problem has many applications in fields 
such as planning, logistics, the manufacture of microchips and DNA sequencing. 


Bury St Edmunds 


Framlingham 


Wickham Market 


Stowmarket 


Ipswich Woodbridge 
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A-Level only 


To solve the problem, we could look first at a brute-force method, testing out every combination of 
routes. 


With just five cities, the number of possible routes is: 4!=4x3x2x1=24. 


A computer could calculate the best route in a fraction of a second. 


The problem is said to be intractable because it will take a long time for a fast computer to find the 

optimal solution for even a relatively small number of cities, and using the brute force algorithm, the 

problem rapidly becomes impossible to solve within a reasonable time as the number of cities increases. 
Comparing time complexities 


The table below shows what a huge difference there is in algorithms with different orders of time 
complexity for different values of n. 


rs = 


a 
ee 


1024 A 16-digit number A 31- “digit r number A 302-digit number - 


Intractable problems, which have no efficient algorithms to solve them, are in fact quite common; so how 
can solutions to these problems be found? 


Heuristic methods 


Not all intractable problems are equally hard, and not all instances of a given intractable problern are 
equally hard. Brute-force algorithms are not the only option for solving these problems. It may be 
quite simple to get an approximate answer, or an answer that is good enough for a particular purpose. 
One approach is to find a solution which has a high probability of being correct. 


Another approach is to solve a simpler or restricted version of the problem, if that is possible. This may 
give useful insights into possible solutions. 

An approach to problem solving which employs an algorithm or methodology not guaranteed to be 
optimal or perfect, but is sufficient for the purpose, is called a heuristic approach. An adequate solution 
may be achieved by trading optimality, completeness, accuracy or precision for speed. The objective is to 
find a good solution in a reasonable time frame. 
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A-Level only 


We often apply heuristics in our everyday lives — if we want to travel from A to B we may use a route that 
we already know, even if itis not the best one. An employer who interviews several people for a job may 
see several suitable candidates, and make a decision based on two or three factors, ignoring others which 
may be relevant to the decision. In psychology, a heuristic is a mental shortcut that allows people to make a 
judgement and solve problems, while being aware that the solution may not necessarily be the optimal one. 


Returning to the Travelling Salesman Problem (TSP), a large number of heuristic solutions has been 
developed, the best of which (developed in 2006) can compute a solution within two or three percent of 
an optimal tour for as many as 85,000 “cities” or nodes. 


Performance modelling 


Performance modelling is the process of simulating different user and system loads on a computer using 
a mathematical approximation, rather than doing actual performance testing which may be difficult and 
expensive. For example, it could be used to test the performance of a network under different conditions. 
The output from the performance model may then be used to help with planning a new system which is 
suited to the requirements of an organisation. 


Pipelining 
Pipelining is the technique of splitting tasks into smaller parts and overlapping the processing of each 
part of the task. It is commonly used in microprocessors used in personal computers so that for example 


while one instruction is being fetched, another is being decoded and a third, executed. It basically works 
much like an assembly line. 


10-52 
Exercises 
1. (a} Describe what is meant by data mining. [2] 
(b) Describe two applications which use data mining. [6] 


2. Describe the key features of a backtracking algorithm. Give an example of a problem which 
can be solved using this technique. [3] 


3. Explain what is meant by a heuristic solution to a problem. Give an example of when such 
a solution could be applied and why it would be an appropriate method. [4] 
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Programming techniques 


In this section: 
Chapter 53 Programming basics 


Chapter 54 Selection 

Chapter 55 I|teration 

Chapter 56 Subroutines and recursion 
Chapter 57 Use of an IDE 


Chapter 58 Use of object-oriented techniques 
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Chapter 53 — Programming basics 


Objectives 


¢ Define what is meant by an algorithm and pseudocode 

e Learn how and when different data types are used 

e Learn the basic arithmetic operations available in a typical programming language 
« Become familiar with basic string-handling operations 


« Distinguish between variables and constants 


What is an algorithm? 


An algorithm is a set of rules or a sequence of steps specifying how to solve a problem. A recipe for 
chocolate cake, a knitting pattern for a sweater or a set of directions to get from A to B, are all algorithms 
of a kind. Each of them has input, processing and output. 


Put flour and salt into a large mixing bowl 
100g plain flour and make a well in the centre. 


2 eggs Crack the eggs into the middle 


300ml milk Pour in about 50m! milk and the oil. 


Ttosp ol Start whisking from the centre, gradually 
Pinch salt drawing the flour into the eggs, milk and 
oil, etc. 


In the context of programming, the series of steps has to be written in such a way that it can be 
translated into program code which is then translated into machine code and executed by the computer. 


Using pseudocode 


Whatever programming language you are using in your practical work, as your programs get more 
complicated you will need some way of working out what the steps are before you sit down at the 
computer to type in the program code. A useful tool for developing algorithms is pseudocode, which is 
a sort of halfway house between English and program statements. There are no concrete rules or syntax 
for how pseudocode has to be written, and there are different ways of writing most statements. We will 
use a standard way of writing pseudocode that translates easily into a programming language such as 
Python, Visual Basic or whatever procedural language you are learning. 


This section does not teach you how to program in any particular programming language — you will learn 
how to write programs in your practical sessions — but it will help you to understand and develop your 
own algorithms to solve problems. 
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An introduction to pseudocode statements 


Input/output statements 


Most programs will have input and output statements to allow the user to enter data and display or print 
results. Here is the pseudocode for a simple example: 


print ("What is your name?") //display text on the screen 
/fwait for user input and assign the value to the variable myname 
myname = input () 

print ("Hello, ", myname) 


This program will ask the user to input their name, and then display “Hello, Jo” or whatever name the 
user entered. Notice that in this pseudocode, text such as “Hello, ” will be wrapped in speech marks to 
distinguish it from variables. 


We will normally use the pseudocode 
myname = input("What is your name?") 


which combines the print and input statements to display the prompt “What is your name” and then 
waits for the user to enter text and press the ENTER key. 


Comments 


Note also that anything following a // will be treated as a comment and will have no effect on the running 
of the program. Comments are very important when you come to code your programs, to document the 
code (specifying the name, author, date written and purpose of the program, for example) and to explain 
how any tricky bits of the program work, 


Data types 


Different data types are held differently in the computer's memory so you need to use the correct data type 
for the task. In Section 6 the most common data types built into programming languages were listed as: 


e integer a whole number such as -25, 0, 3, 28679 

* real/float a number with a fractional part such as -13.5, 0.0, 3.142, 100.0001 

e Boolean a Boolean variable can only take the value TRUE or FALSE 

* character a letter or number or special character typically represented in ASCII, such as a, A, %, 


? or %. Note that the character “4” is represented differently in the computer from the 
integer 4 or the real number 4.0 


* string anything enclosed in quote marks is a string, for example “Peter”, "123", or “This is a 
string”. Either single or double quotes are acceptable. 


Common arithmetic operations 


The symbols +, -, * and / are used for the common arithmetic operations of addition, subtraction, 
multiplication and division. 


@.g. Suppose the bill in a restaurant comes to £20, and you want to divide it equally among 
3 or 4 friends. 


bill = 20 
billBetween4 bill/4 will return the value 5 
billBetween3 = bill/3 returns 6.666666667 
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In pseudocode you can assume that bil1lBetween3 will be automatically defined as a real 
variable and will return a value such as 6.666666667, though this may not be the case in every 
programming language. 


The Round function 
You can round this number using a function round. 


billBetween3 = round(billBetween3,2) //round to 2 decimal places 


This will return the value 6.67, 


Exponentiation 


lf you want to find, for example 2°, 5 is called the exponent and you need to use exponentiation. 
You can write this operation in pseudocode as 


w= 2**S5 
or, using variables, 
x= y*n 
Integer division and finding a remainder 


Sometimes you may want to perform integer division and find a remainder. 


For example: Twenty apples are to be divided between 6 people. How many will each receive, and how 
many will be left over? 


In this case you need to use the div operator to find the whole number of apples each person will 
receive. The mod operator will find the remainder. 


These two operations are coded differently in different programming languages, but in pseudocode you 
could write the following statements: 


apples = 20 


applesPerPerson = 20 div 3 (written applesperPerson 20//3 in Python) 


This will return 6 in applesPerPerson. 


applesRemaining = 20 mod 3 (written applesRemaining 20%3 in Python) 


This will return 2 in applesRemaining. 
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String-handling functions 


Programming languages have a number of built-in string-handling methods or functions. Some of the 
common ones in a typical language are: 


len (string) Returns the length of a string 


string. find(str) Determines if str occurs in string. Returns index (the position of the first 
character in the string) if found, and -1 otherwise. In our pseudocode we will 
assume that string (1) is the first element of the string, though in Python, for 
example, the first element is string (0) 


ord("a") returns the integer value of a character (97 in this example) 


chr (97) returns the character represented by an integer ("a” in this example) 


To concatenate or join two strings, use the + operator. 


e.g. “Johnny” + “Bates” = “JohnnyBates” 


String conversion operations 


int("1") converts the character “1” to the integer 1 

str (123) converts the integer 123 into a string “123” 

float ("123.456") converts the string “123.456” to the real number 123.456 
str (123.456) converts the real number 123.456 to the string "123.456" 
date (year, month, day) returns a number that you can calculate with 

Example: 


datel = date(2015,1,18) 
date2 = date (2014,12,30) 
days = datel-date2 

print (datel, date2, days) 


This will output 
2015-01-18 2014-12-30 19 


The actual code in, for example, Python or VB will be similar but not identical. You may need to import a 
datetime library module. 
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Constants and variables 


Some programming languages require you to declare all variables and constants before they are used in 
the program. 


Variables are identifiers (names) given to memory locations whose contents will change during the 
course of the program; we have seen plenty of examples of these — e.g. in the statement below, the 
variable myname will change according to what the user enters. 


myname = input("Please enter your name: ") 


Some programming languages also allow you to define constants, whose value never changes while the 
program is being run. For example, if your program involved calculating the area of a circle, you could 
define pi at the start of the program as a constant having the value 3.14159. Or, you might hold the 
company phone number as a constant, declared at the start of the program as 


const companyPhone = "01453 123456" 


The advantage of using a constant is that in a long, complex program there is no chance that a 
programmer will accidentally change its value by using the identifier for a different purpose. 


Some languages such as Python do not require or even allow you to define variables or constants — you 
just use them as and when required in the program. 


Standards for variable names 


Most programming languages are very flexible in the format of variable names that you can use. 
Typically they must start with a letter or an underscore, with the rest of the name consisting of letters, 
numbers or underscore. Spaces and other characters are not permitted. 


You should always try to use meaningful names for variables, rather than x, y and z, as this helps to make 
the program easy to follow and update when required. It is also helpful, within a team of programmers, 

to have standards for naming variables and constants, as this will leave less room for errors and 
inconsistencies in the names in a large program. 


Guidelines could include: 


¢ Start all variaodle names with a lowercase letter 
e Do not use underscores in the middle of variable names 
e Use “camelCaps” to separate parts of a variable name — for example, timelnMinutes, maxTemperature 


« Do not use overly long names but keep them meaningful - maxTemp is better than 
maximumTemperature if there is not likely to be any confusion over the meaning of max 


e Use all uppercase letters for constants, which are then instantly identifiable 


« When defining a class in object-oriented programming, start with an uppercase letter, with the rest of 
the class name lowercase 


Fallowing guidelines such as these will save a lot of time in looking through a program to see whether you 
called something best_score, Best_Score, bestScore or some other variation. 
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Exercises 


1. Aschool keeps data about each of its pupils. State the most suitable data type for each of the 
following data items: 
Pupil’s surname 
A single letter indicating whether they are male or female 
The amount owed for school trips 
The number of school trips they have participated in 
Whether or not the pupil is entitled to free school meals [5] 


2. (a) Write pseudocode for a program which asks the user to enter the total bill for a restaurant 
meal, and the total number of people who had a meal. The program should add 10% to the 
bill as a tip, and then calculate and display to the nearest penny what each person owes, 
assuming the bill is evenly split. [6] 


(6) Complete the following table showing an additional two sets of test data, the reason for 
each test and the expected result. [6] 


Total amount exactly divisible by 
number of people 


3. (a) Name two ways in which you can help to make your programs understandable to another 
programmer. [2] 


(b) Imagine that you have had a stall at the Summer Fayre. At the end of the day you count up 
the number of each 1p, 2p, 5p, 10p, 20p and 50p coins you have received. 


Write a pseudocode algorithm to allow the user to input the number of coins of each value, 
and to calculate and display the total takings. 


Make use of two ways of making the program understandable given in your answer to part (a). [6] 
4. Below is an algorithm that adds VAT to the net price of an item and outputs the total price. 


VATRATE = 20 // rate of VAT, currently 20% 

NetPrice: Real // net price is price without VAT 
PriceWithVAT: Real // tetal price is price including VAT 
AmountOfVAT: Real // amount of VAT to be added 

NetPrice = input("Enter net price: ") 

AmountOfVAT = NetPrice * VATRATE / 100 

PriceWithVAT = NetPrice + AmountOfVAT 

print (AmountOfVAT) 

print (PriceWithVAT) 


(a) Write down one example of the following from the above algorithm: 

(i) a constant; (ii) avariable; (ii) a comment [3] 
(6) Suggest three standards for naming variables, and give two reasons why such standards 

are useful. [5] 
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Chapter 54 — Selection 


Objectives 
e Use relational operators 


e Use Boolean operations AND, OR, NOT, XOR 


e Use nested selection statements 


Program constructs 
There are just three basic programming constructs: sequence, selection and iteration. 
Sequence is just two or more statements executed one after the other, such as 


n = input("Please enter a number: ") 
nsquared =n *n 
print ("The square is",nquared) 


The second statement is an assignment statement in which a value is assigned to a variable. 


In this chapter and the next, we will look at selection, iteration and recursion. 


Selection 


Selection statements are used to select which statement will be executed next, depending on some 
condition. Conditions are formulated using relational operators. 


Relational operators 


The following operators may be used in pseudocode for making comparisons: 


> greater than <= less than or equal 
< less than == equal 
>= greater than or equal l= not equal 


If ... then ... else 


Selection statements can take different forms, for example: 


if ({expressionl) then 
(do these statements) 


endif 
expressionl is an expression involving a relational operator such as 


if {AGE >= 17)then 
canDrive = TRUE 
endif 
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lfexpressionl does not evaluate to TRUE, control passes to the next statement after the if statement. 


Alternatively, you can specify what should happen if the condition does not evaluate to TRUE: 


if (expressionl) then 
(do these statements) 
else 
(do these statements) 
endif 


For example: 


if mark >= 50 then 

print ("Pass") 
else 

print ("Fail") 

print ("You will have to retake this test.") 
endif 


A ‘nested’ selection statement may have several alternatives: 


if (expressionl) then 
if (expression2) then 
(do these statements) 
else 
do these statements) 
endif 
else 
(do these statements) 
endif 


Example 1 
A bank offers different interest rates according to how much is in the account. There are three thresholds 
of £500, £3,000 and £10,000: 


If amount less than 500, rate = 1% 
if amount is greater than or equal £500 but less than £3000, rate = 1.5% 
if amount is greater than or equal £3000 but less than £10000, rate = 2% 


If amount is greater than or equal £10000, rate is 3.5% 


The selection statement can be written as follows: 


if (amount< 500) then 
rate = 0.01 

else if (amount<3000) then 
rate = 0.015 

else if (amount<10000) then 
rate = 0.02 

else 
rate = 0.035 

endif 
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The switch/case statement 


Some programming languages support the use of a switch or case statement, an alternative structure 


to a nested if statement. It is useful when a choice has to be made between several alternatives. 


Example 2 


Perform different statements according to an option choice entered by the user. 


switch choice: 


i :print("You have selected option 1") 
(more statements here) 
2 :print("You have selected option 2") 
(more statements here) 
3 :print("You have selected option 3") 
(more statements here) 
else 
print("You must enter 1, 2 or 3") 
endswitch 
Example 3 


A statement to calculate the number of days in the month between 2001 and 2009 may be written: 
switch month: 
‘Tan, "Wer", "May", *oul", "kng*,"Ger™, "feo"? 
"Apr" F "Tt" p " Sep" - "Noy" : 


31 
30 


daysInMonth 
daysInMonth 


"Feb"; if year MOD 4 = 0 then 
daysInMonth = 29 
else 
daysInMonth = 28 
endif 
endswitch 


Boolean operators AND, OR, NOT 


More complex conditions can be formed using the Boolean operators AND and OR. 


Example 4 

1— {a > b) AND (a > cc) then 
max =a 

else if (b > a) AND (b > c) then 
max = b 

else 
max =c 

endif 
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Example 5 


Write pseudocode for a program to allow the user to input the day of the week and output “Weekday” or 
“Weekend”, 


day = input ("Enter day of week: ") 

if (day = "Saturday") OR (day = "Sunday") then 
print ("Weekend") 

else 
print ("Weekday") 


Example 6 


A tourist attraction has a daily charge for children of £5.00 on a weekday, or £7.50 on a weekend or bank 
holiday. Adults are charged £8.00 on weekdays and £12.00 on weekends and bank holidays. 
Write pseudocode to allow the user to calculate the charge for a visitor. 


day = input ("Enter W for weekend, B for bank holiday or D for weekday: ") 
visitor = input ("Enter A for adult, C for child: ") 
i= ((day = "W") OR (day = "B")) AND (visitor = "A") then 

charge = 12.0 
else if ((day 


= "W") OR (day = "B")) AND (visitor = "C") then 
charge = 7.5 


else if (visitor = "A") then 
charge = 8.0 

else 
charge = 5.0 

endif 


Notes: It is important to use brackets and to get them in the correct place to avoid any confusion over 
which operator is processed first. In standard Boolean logic the precedence rules make NOT 
highest, then AND, then OR. 


The NOT operator 
You can usually avoid the use of the NOT operator, replacing it with an appropriate condition. e.g. 
NOT (a = b) is equivalent toa != b 


NOT (a < b) is equivalent toa >= b 


The XOR operator 


OR stands for exclusive OR, so that a XOR b means “either a or b but not both’. 
This can be implemented with a combination of AND, OR and NOT conditions: 


(a AND NOT b) OR (NOT a AND b) 


Note that NOT takes precedence over AND. Add extra brackets if you are in any doubt! 
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Exercises 


s I 


is 


Below is a segment of an algorithm. 


swimTime = False 


if (Membership == "Premier") then 
swimTime = TRUE 
else if ((Membership == "Adult") AND (Day == "Weekday") AND 
(Time < 1500)) OR 
((Membership == "Adult") AND (Day == "Weekend")) then 
swimTime = TRUE 
else if (Membership == "Junior") AND (Day == "Weekend") then 
swimTime = TRUE 
endif 


Write down the values of swimTime after the segment of the algorithm has executed for the 
following data: 


(i) Membership: Premier Day: Weekday Time: 1700 
(ii) Membership: Adult Day: Weekday Time: 1100 
(iil) Membership: Junior Day: Weekday Time: 1000 
(iv) Membership: Adult Day: Weekend Time: 0900 
(v} Membership: Adult Day: Weekday Time: 1530 [5] 


(a) Write a pseudocode algorithm for a program which calculates the cost of carpeting a room. 
The carpet is supplied in a roll 4m wide. The cost of the carpet is £10 per square metre. 
The program should ask the user to enter the longest dimension (length) and shortest dimension 
(width) of the room, then calculate and display the length and width and cost of carpet that 
will be supplied. 


You can assume that the width of the room is not more than 4m. If a width of more than 4m is 
entered, display an error message and quit the program. 


The length could be more or less than 4m. [5] 
(b) Calculate the expected results for the following room sizes: 
Length = 5, width = 3 
Length = 5, width = 4 
Length = 3, width = 2 
Length = 3.9, width = 2 
Length = 6, width = 5 [5] 
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Chapter 55 — Iteration 


Objectives 
* Understand and use three different types of iterative statement WHILE, REPEAT and FOR 


Performing a loop 


In the last two chapters we looked at sequence and selection statements. The third basic programming 
construct is iteration. Iteration means repetition, so iterative statements always involve performing a 
loop in the program to repeat a number of statements. There are three different types of loop to be 
considered, although some programming languages co not implement all three. 


The while ... endwhile loop 
A while ... endwhile loop has two properties: 


* The expression controlling the repetition of the loop must be of type Boolean — that is, one which 
evaluates to True or False 


« This expression is tested at the start of the loop 


This is best explained by means of an example. Suppose you wanted to input the daily maximum 
temperatures for one month, calculate and output the average of these measurements. 


The program has to work for any month, so when you have entered all the temperatures you will 
enter a ‘dummy’ value -100 to signify that there are no more temperatures to enter. 


A first attempt at the pseudocode might look like this: 


temp = 0 // initialise temp 

totalTemp = 0 // initialise total of Temperatures 
numberOfTemps = 0 // initialise number of temperatures 
while temp != -100 


temp = input("Enter next temperature") 
totalTemp = totalTemp + temp 
numberOfTemps = numberOfTemps + 1 
endwhile 
averageTemp = totalTemp/numberOfTemps 
print (averageTemp) 


Test this algorithm with temperatures 8, 12 and -100. We can draw a trace table showing the value of the 
variables as they change during execution of the program. 
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You should have ended up with 3 temperatures and an average temperature of -26.66667 instead of 

10. The problem is that the expression controlling the loop is tested only once each time round, at the 
beginning of the loop, and not after each statement within the loop as it is executed. Therefore, we have 
to make sure that as soon as the number -100 is entered, the next thing that happens is that the Boolean 
expression is tested. 


temp = 0 // initialise temp 

totalTemp = 0 // initialise total of Temperatures 
numberOfTemps = 0 /f initialise number of temperatures 
temp = input("Enter first temperature") // input first temperature 
while temp != -100 


totalTemp = totalTemp + temp 

numberOfTemps = numberOfTemps + 1 

temp = input("Enter next temperature") 
endwhile 
averageTemp = totalTemp/numberOfTemps 
print ("Average temperature: ", averageTemp) 


Note that with a while ... endwhile loop, if the Boolean expression is FALSE at the start, the loop will not 
be executed at all and control will pass straight to the next statement after endwhile. 
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The repeat ... until loop 


This type of loop is very similar to the while ... endwhile loop, with the difference that the Boolean 
expression controlling the loop is written and tested at the end of the loop, rather than at the beginning. 
This means that the loop is always performed at least once. 


Note: Python does not support a repeat ... until statement, but the same output can be achieved 
with a while ... endwhile loop. 


Example 1 
Write pseudocode for a program which tests someone on the squares of numbers up to 25. 


// program to test a user on the squares of numbers 
// random(a,b) generates a random integer between a and b 
repeat 
num = random(1,25) 
numsquare = num * num 
answer = input ("What is the square of ", num, "? ") 
if answer == numsquare THEN 
print ("correct, well done") 
else 
print("No, it is ", numsquare) 
endif 
anotherGo = input ("Another go? Answer Y or N: ") 
until (anotherGo = "N") OR (anotherGo = "n") 


300 


CHAPTER 55 — ITERATION 


The for ... next loop 


This type of loop is useful when you know how many iterations need to be performed. For example, 
suppose you want to display the two times table: 


for count = 2 to 12 

product = 2 * count 

print("Z 2 ", fount, "= *", product) 
next count 


The value of count starts at 2 and is incremented each time round the loop. When it reaches 12, the loop 
terminates and the next statement is executed. 


Nested loops 


Loops can be “nested” one inside another. Suppose we want to display all the multiplication tables 
between 2 and 12. We can do this with two FOR loops, one inside the other. 


Example 2 
for table = 2 to 12 
for count = 2 to 12 
product = table * count 
print (table, "x ", count, " =", product) 
next count 
next table 


Example 3 
Use a random number generator to simulate throwing two dice to find out how many throws it takes to 
get a 6. 


totalThrows = 0 
answer = my" 


while (answer == "y") OR (answer == "Y") 
numberOfThrows = 0 
throw = 0 
while throw != 6 


throw = random(1, 6) 
numberOfThrows = numberOfThrows + 1 
print ("You threw a ", throw) 


endwhile 

print "That took ", numberOfThrows," throws" 

answer = input("Another go? (Y or y): ") 
endwhile 
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Example 4 


You can count backwards as well as forwards in a for ... next loop. Here is a pseudocode program 
which uses the ‘sleep’ method to count down in seconds to blast-off. It starts by importing an external 
module called time which contains a built-in method sleep: 


import time // imports an external module 
ReadyForCountdown = input("Press enter when you're ready to start") 
for sec = 10, 0, step -l 
print (sec) 
time.sleep (1) // suspends execution for 1 second 
next sec 


print ("BLAST-OFF!") 


Exercises 


1. Write a pseudocode algorithm to allow the user to input two integers highestNumber and 


multiplier. The program should output the results of multiplying integers 2, 3... highestNumber 
oy multiplier. 


For example if the user enters 100 for highestNumber and 7 for multiplier the program should 
output the numbers 14, 21 ... 98. [5] 


2. Write pseudocode for a program that asks the user which times table they would like to be 
tested on, and then gives them 5 random questions on this table, telling them each time whether 
they got the answer right or wrong. [5] 
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Chapter 56 — Subroutines and recursion 


Objectives 


e Be familiar with subroutines (functions and procedures), their uses and advantages 
e« Use subroutines that return values to the calling routine 
e Describe the use of parameters to pass data to subroutines by value and by reference 
*« Contrast the use of local and global variables 
TA) e Write and trace recursive subroutines 


TA) * Compare recursion with an iterative approach 


Types of subroutine 
A subroutine is a named block of code which performs a specific task within a program. 


Most high-level languages support two types of subroutine, functions and procedures, which are 
called in a slightly different way. Some languages such as Python have only one type of subroutine, 
namely functions. 


All programming languages have ‘built-in’ functions which you will have already used if you have written 
any programs. For example, in Python: 

myName = input("What is your name? ") 

print ("Hello, ", myName) 


A subroutine is called by writing its name in a program statement. Some functions return a result, like 
input function above, and some do not return any result, like the print function. Notice that the first 
statement above combines the print and input functions; when the statement is executed, the computer 
will display the question “What is your name?” and wait for the user to inout an answer, which will be 
assigned to the variable myName. 


In languages which distinguish between functions and procedures, a function is called like the input 
above and always assigns a return value to a variable. A procedure is called by writing its name but 
not assigning the result to a variable, like the print statement above. However, as we shall see later, a 
procedure can still pass values back to the calling program if necessary. 


ln Chapter 53, we listed some string-handling functions, and we can write, for example, pseudocode 
such as 


x = int ("567") 


to call the int function, which will convert the string “567” into an integer. 
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User-written subroutines 
You can write your own subroutines (functions and/or procedures) and call them from within the program 
as many times as needed. The subroutine first needs to be defined, typically above the code in the main 
program. 


Example 1 
Using pseudocode, write a subroutine which displays a menu of 4 options in a game. 


procedure displayMenu // declare the subroutine 
print("Option 1: Display rules") 
print ("Option 2: Start new game") 
print{"Gption 31 Gait”) 
print("Enter 1, 2 or 3: ") 

endprocedure 


To call the subroutine from the main program, you simply write its name: 
displayMenu 


This subroutine always produces the same result whenever it is called; it simply displays this menu. 


Example 2 
Sometimes, you may want a subroutine to return a value to the main program: 


function getChoice 
print ("Option 1: Display rules") 
print ("Option 2: Start new game") 
print("Option 3: Quit") 
print{"Enter 1, 2 or 3: *} 
choice = input () 
return choice 

endfunction 

#main program starts here 

option = getChoice 

print ("You have chosen ",option) 


In this example, when the program is run, the first line to be executed is the first staternent in the main 
program, option = getChoice. The subroutine is called, it displays the menu, gets the user’s 
choice in choice and returns this to the main program using the statement return choice. 
Execution continues where it left off, at the statement print ("You have chosen ", option). 


The subroutine is called in a slightly different way from the subroutine displayMenu — compare this to 
the two different ways in which built-in print and input subroutines are called. 


print("What is your name?") 
myName = input () 
print("Hello, ",myName) 


The print subroutine does not return a value, the input subroutine does. 
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Passing parameters by value and by reference 


Frequently, you need to pass values or variables to a subroutine. The exact form of the subroutine 
interface varies with the programming language, but will be similar to the examples below: 


procedure subroutineName (parameterl, parameter 2,...) 


function subroutineName (parameterl, parameter 2,...) 


In some programming languages, parameters may be passed in different ways. lf a parameter is passed 
by value, its actual value is passed to the subroutine, where it is treated as a local variable. Changing a 
parameter inside the subroutine will not affect its value outside the subroutine. 

All parameters are passed by value in Python. 


In Visual Basic or Pascal, parameters may be passed by value but they may also be passed by 
reference. |n this case, the address, and not the value, of the parameter is passed to the subroutine. 
Therefore, if the value is multiplied by three, for example, its value in the main program will reflect that 
change since it is referring to the same memory location. 


To pass by reference in Pascal, the procedure header will specify that the relevant parameter is a variable. 
For example: 


procedure abc(x, y : integer; var z : integer;) 
Here, x and y are passed by value and z is passed by reference. 


Example 3 


Consider a simple subroutine which calculates the volume of a cylinder. In the main program, the user 
is asked to enter values for the radius and length of the cylinder. These variables are then passed as 
parameters to the subroutine for use in the calculation. 


The values of the parameters radius and length in line 9 are passed to the subroutine where they are 
referred to using the identifiers r and len respectively. The order in which the parameters are written 
when calling the subroutine are written is important: radius is passed to r, length is passed to len. 
The return value vol is passed back to the main program, where it is assigned to volume in line 9. 


function cylinderVolume (r,len) 
pi = 3.142 
vol = pi*r*r*len 


return vol 
endfunction 


in o& Wo Mh Fr 


6 #main program 

7 radius input ("Enter the radius of the cylinder: ") 
8 length input ("Enter the length of the cylinder: ") 
9 

1 


volume = cylinderVolume (radius, length) 
O print ("The volume of the cylinder is ", volume) 
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Local and global variables 


Variables used in the main program are by default global variables, and these can be used anywhere in 
the program, including within any subroutines. Within a subroutine, local variables can be used within 
the subroutine, and these exist only during the execution of the subroutine. They cannot be accessed 
outside the subroutine and changing them has no effect on any variable outside the subroutine, even if 
the variable happens to have the same name as the local variable. 


In Python, variables used in subroutines are local by default, unless they are declared as global in the 
calling program. 


The ability to declare local variables is very useful because it ensures that each subroutine is completely 
self-contained and independent of any global variables that have been declared in the main program. 


The principle of encapsulation of all the variables needed in a subroutine is very important in 
programming. A subroutine written according to this principle can be tested independently, and used 
many times in many different programs without the programmer needing to Know what variables it uses. 
Any variable in the calling program which coincidentally has the same name as a local variable declared in 
the subroutine will not cause an unexpected side-effect. 


Example 4 
1 procedure printNumbers (x) 
2 a=l 
3 b= 2 
4 c= 3 
5 print ("In the subroutine, a,b,c and x have values ", a,b,c,x) 
6 endprocedure 
7 #main program 
8B a=a4 
9 b=5 
19c¢ = 6 
li x = 10 


12 print("In the main program, a,b,c and x have values ", a,b,c,x) 
13 printNumbers (x) 
14 print("In the main program, a,b,c and x now have values ", a,b,c,x) 
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Modular programming 


When a program is short and simple, there is no need to break it up into subroutines. With a long, 
complex program, however, a top-down approach, in which the problem is broken down into a number 
of subtasks, is generally very helpful in designing the algorithm for a satisfactory solution. 


Programming with subroutines 


Using subroutines in a large program has many advantages: 


*« A subroutine is small enough to be understandable as a unit of code. It is therefore relatively easy to 
understand, debug and maintain especially if its purpose is clearly defined and documented 


e Subroutines can be tested independently, thereby shortening the time taken to get a large program 
working 


e Once a subroutine has been thoroughly tested, it can be reused with confidence in different programs 
or parts of the same program 


*« Inavery large project, several programmers may be working on a single program. Using a modular 
approach, each programmer can be given a specific set of subroutines to work on. This enables the 
whole program to be finished sooner 


*« A large project becomes easier to monitor and control 


Recursion 


A-Level only 
Definition of a recursive subroutine 


A subroutine is recursive if it is defined in terms of itself. The process of executing the subroutine is 
called recursion. A recursive routine has three essential characteristics: 


*« A stopping condition or base case must be included which when met means that the routine will not 
call itself and will start to ‘unwind’ 


*« For input values other than the stopping condition, the routine must call itself 


« The stopping condition must be reached after a finite number of calls 
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A-Level only 
Recursion is a useful technique for the programmer when the algorithm itself is essentially recursive. 


Some algorithms can be written using either recursion or iteration. Recursive routines are often much 
shorter, but more difficult to trace through. If a recursive routine is called a very large number of times 
before the stopping condition is reached, the program may crash with a “Stack overflow” error (see 

below, “Use of the call stack”). An iterative routine, on the other hand, has no limit to the number of 


times it may be called. 


Example 
A simple example of a recursive routine is the calculation of a factorial, where n! (read as n factorial or 
factorial n) is defined as follows: 


fn =O then n! = 1 
otherwise n! =n x (n-1) x (n-2) ...x8x2x1 
Thus for example 5!=5x4x3x2x1 


If we were calculating this manually, we probably calculate 5 x 4 = 20, then multiply 20 by 3 and so on. 
The calculation could be written as 


5! = ((((5 x 4) x 3) x 2) x 1) = (((20 x 3) x 2) x 1) = ((60 x 2) x 1) = 120 x 1 = 120 
This is essentially how recursion works. In pseudocode, it can be written liKe this: 


function calcFactorial (n) 


if n == 0 then 
factorial = 1 
else 
factorial =n * calcFactorial (n-1) 
print (factorial) //LINE A 
endif 
return factorial 
endfunction 


Nothing will be printed until the routine has stopped calling itself. As soon as the stopping condition is 
reached, in this case n = 0, the variable factorial is set equal to 1, the return statement at the end 
of the subroutine is reached and control is passed back (for the first time, but not the last} to the next 
statement after the last call to calcFactorial, which is the print statement marked LINE A. 
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Use of the call stack 


In Chapter 36 the use of the call stack was discussed. Each time a subroutine is called, the return 
address, parameters and local variables used in the subroutine are held in a stack frame in the 
call stack, 


Consider the following example: 


. procedure printList (num) 


Es num = num - 1 

3. if num > 1 then printList (num) 

4, print ("At B, num = ", num) // Line B 
5. endprocedure 

6. #main program 

7, x = 4 

&. printList (x) 

9. print ("At A, x =", x) // Line A 


Return addresses, parameters and local variables (not used here) are put on the stack each time a 
subroutine is called, and popped from the stack each time the end of a subroutine is reached. At Line 8, 
for example, Line 9 (referred to here as Line A) is the first return address to be put on the stack with the 
parameter 4 when printlist (x) is called from the main program, with the parameter 4. 


Representations of the current state of the stack each time a recursive call is made, and the subsequent 
“unwinding” are shown below. : 11-56 


B2 
B3 B3 B3 
A4 A4 A4 A4 A4 


The output from the program is: 


At B, num = 1 (printed at Line B) 
AtB, num= 2 (printed at Line B) 


AtB, num= 3 (printed at Line B) 
AtA, x =4 ( 


printed at Line A) 
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Exercises 


y I 


(a) 


A program may use global and local variables. 
(i) Explain one difference between a global variable and a local variable. [2] 


(ii) Describe what will happen if a programmer declares a global variable and a local variable 
with the same name. [2] 


Jo has written a computer program to produce invoices for customers of her father’s plumbing 
business. 


To calculate the invoice total, the number of hours worked is rounded up to the next integer (e.g. 
67 minutes would round up to 2 hours). This is then multiplied by the hourly rate. Finally, the cost 
of parts is added. 


Here are some extracts from Jo's code. 
Ol REAL HourlyRate 
40 PROCEDURE Initialise 


41 HourlyRate = 15 
42 END PROCEDURE 


60 PROCEDURE CalculateTotal 

61 INTEGER TimeInMinutes 

62 INTEGER CostOfParts 

63 INPUT TimeInMinutes 

64 INPUT CostOfParts 

65 CUTPUT TimeInMinutes DIV 60 + 1 * HourlyRate + CostOfParts 


66 END PROCEDURE 
State one global variable and one local variable in Jo's code. [2] 
Line 65 contains an error. 


(i) Calculate the output of the procedure CalculateTotal if TimeInMinutes = 936 and 
CostOfParts = 100 using 


OUTPUT TimeInMinutes DIV 60 + 1 * HourlyRate + CostOfParts 
You must show your working. [2] 


(ii) Calculate the output of the procedure CalculateTotal if TimeInMinutes = 60 and 
CostOfParts = 0 using 


OUTPUT TimeInMinutes DIV 60 + 1 * HourlyRate + CostOfParts [1] 


(iii) Show how the procedure should be modified so that it produces the correct answer. [3] 


Evaluate the extract of Jo’s code. You should identify and explain the positive and negative 
aspects of her coding style and the implications that this will have on the maintainability of 
the program. 


The quality of written communication will be assessed in your answer to this question. [8] 
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A-Level only 


2. The words COW, BEEF, and FORTY have all their letters written in alphabetical order. Here is an 
algorithm for a function which checks whether all the letters in a word are in alphabetical order. 


O1 FUNCTION IsInOrder (Word) 


02 IF LENGTH (Word) = 1 THEN 

03 RETURN TRUE 

04 ELSE 

05 FirstChar = First character in Word 

06 RestOfWord = All characters in Word except the first 
07 IF FirstChar > RestOfWord THEN 

08 RETURN FALSE 

09 ELSE 

10 RETURN IsInOrder (RestOfWord) 

Ea END IF 


12 END IF 
13 END FUNCTION 


(i) Describe what is meant by a parameter. [2] 
(ii) Identify one parameter in the algorithm above. [7] 


Explain the difference between the uses of the = sign in line 02 and in line 05, stating the 
type of operation being carried out. [4] 


Line 07 compares the first character of the word with the rest of the word as shown below. 
O07 IF FirstChar > RestOfWord THEN 


Explain why there may be a problem with the call IsInOrder ("FoRtY") 
and what can be done to avoid this problem. [3] 


State what is meant by recursion, using this algorithm as an example. [2] 
The algorithm is tested with the call IsInOrder ("2"). State the value which will be returned. 
State the lines of the algorithm which will be executed. [2] 


Explain what happens if the algorithm is tested with a call IsTnOrder (" ") where the value of 
the argument is the empty string. [2] 


Explain what happens when the algorithm is tested with the call IsInOrder ("APE"). 


You should show each call made, the lines of the algorithm executed and the return value 
of each call. You may use a diagram. [6] 
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A-Level only 
3 


. (a) Explain briefly the main features of a recursive procedure from the programmer's point of 
view. Explain what is required from the system in order to enable recursion to be used. [3] 


(b} The following recursive subroutine carries out a list operation. 


function listProcess (numList) 
if length(numlist) > 0 then 
Remove first element of numlist and store in first 
listProcess (numList) 
append first to end of numList 
endif 
return numList 
endfunction 


(i) Complete the following trace table if the list numbers is defined in the main program as 
numbers = [3,5,10,2] 


and the subroutine is called with the statement 
new = listProcess (numbers) 


(i) Explain what the subroutine does. [1] 
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Chapter 57 — Use of an IDE 


Objectives 
* Be familiar with the use of an IDE to develop and debug a program 
. Understand the purpose of testing and devise a test plan 


Facilities of an IDE 


When you create a program you will be using a software package that helps you write the code more 
easily. 


This is called an Integrated Development Environment or IDE. 


The screenshot below shows the Komodo IDE being used, 


« are e cry : oe- ‘SAGewSS | 
Be Oe 


S Stan Page + recurve tan.py  * name search IDK debugging py = -— = 3 Teeter x 
abe “4 Level Conputer Science Unit 7? Forksheer ¥ Task Saad oe p 
name = (“Annie*, "Bob", Charles", "Dan", *Erice*,*fessai*) |? Gi Sempten (7.0) 
searchame = input ("Enter search same: *) » Pythoniiin Shell 
found = faire » Run Python Fie 


fer iadem ss sence (eax): & Save and Pun 
af name(index] <- searchifane: » WB Seactate Toots 
print(*record nunber *, inden-i) > (ld Rants Toots 


eetaveunwe sy 


Tite “C1 \Ueers\ Pat \Docementa \Ny Dropbox KSI Propect\A Level Series \Cck A leve: 
Series\CCR A Level book\OCR Section i! Programming techniques\naeme search IDE 
@ebugging.py", tine 7, in <neduie> 

af neme[index)] == searchNane: 
IedexError: list index out of reage 


Figure 57.1; Syntax error 


The IDE provides many tools to help you enter, edit, compile, test and debug your programs. 


Entering a new program 


In the screenshot above, you can see a menu at the top of the screen. Choosing File, New will present 
you with a blank screen to type your program. 


The program is typed in the main window. The IDE adds line numbers for easy reference. You can save it 
using an option from the File menu. You can also edit your code. 
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Compiling and running your program 


When you are ready to try out your program, it first has to be translated into machine code. This will be 
done using a compiler or interpreter. 


Double-clicking the option Save and Run in the Toolbox on the right of the window in the Koamado IDE 
will translate the program and report any syntax errors. 


In Figure 57.1, the interpreter has found a syntax error, and it tells you what line the error is on. There 
should be a double = sign on line 7, so you can correct that and run the program again by double- 
clicking Save and Run again. This time, the program starts to execute but then crashes because of a 
logic error, and the output is displayed in the bottom window. 


Commend Cvipet Metdhcatoen  Synten Chechong Stetus | 

CAP AOD python ene (Cues Pan Documents My Oropb avg technapaernzene veasch (OL debugpeg py eetvened ) 
Ester search name: Eaith 
Trecebers (most recent ceili iest): 

File “C: \Ceers\ Pet \Socumentse\My Orepeen AS) Preject\A Level series\OcR A Level 
Series\CCR A level booa\CCR Section ii Programming techaiques\same search 508 
@ebuggies.py*, lise 7, if qasdele> 

if seme[indes! — search¥ane: 
EndesErrer: list index eet of reage 


Figure 57.2: Logic error 


Can you spot the logic error? Logic errors are usually much more difficult to find than syntax errors. 
The IDE has various tools to help you. 


e You can set a breakpoint in the program which will cause the program to stop on that line, so that 
you can see whether it reaches that line. 


« You can set a watch on a variable so that its value is displayed each time it changes 


« You can step through a program a line at a time so that you can see what is happening 


In this program, the variable max is the length of the list and it has been incorrectly set to 7. It should be 
6. Once this has been done, it returns the correct result: 


[ Commemaeed Cnet Metdicctions  Syntan Chechang Seetes om 
CVAD LS pytteeeiene Cer Pat Documents\My Drop ang technquer mame seach OF detmeggengymy retard 


| tocer search semen Baier | 
Search name nat found 


Figure 57.3: Program executes correctly 
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Typical debugging options in an IDE 


The tabhe kets comerce tasks acd ther kemeads commends 


RO SENEOS AES SEES SERED 0 ERR, OF 


program 

Step In 

The qnecutes the next unit of code, and then stops at the 
scbengoent los. 

Step Over 


Paute debugging an application at the current execution 
GafCuntinee couiones debugging bom Gat iia 


Figure 57.4 


Test strategies 


There are several different test strategies used by software development companies, some of which 
are applicable to smaller software projects that you may have written. Test strategies are discussed in 


Section 3, Chapter 11. 
Testing your own software 


You will have implemented several algorithms in your practical sessions. Testing your solutions for 
correctness can be a complex and time-consuming task, but one that needs to be done thoroughly 
and systematically. 


The purpose of testing is not to show that your program usually works correctly, if the user is careful 
when entering input data, The purpose of testing is to try and uncover undetected errors. 


Devising a test plan 


Your program should work correctly whatever data is input. If invalid data is entered, the program 
should detect and report this, and ask the user to enter valid data. Some data may be valid, but may 
nevertheless cause the program to crash if you have not allowed for particular values. 
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We need to choose test data that will test the outcome for any user input. To do this, we need to select 
normal, boundary and erroneous data. 


* normal data is data within the range that you would expect, and of the data type (real, integer, string, 
etc. that you would expect. For example, if you are expecting an input between O and 100, you 
should test 1 and 99 


¢ boundary data is data at the ends of the expected range — for example, test 0 and 100 to make sure 
that these give the expected results if the valid range is between O and 100 


* erroneous data is data that is either just outside an expected range, e.g. -1, 101 or is of the wrong 
data type — for example, non-numeric characters when you are expecting a number to be input 


For each test, you should specify the purpose of the test, the expected result and the actual result. 


Example 1 


The following algorithm is intended to calculate and print the average mark for each student in a class, for 
all the tests they have attempted: 


// average mark 
students = input ("How many students? ") 


for n = 1 to students 
name = input("Enter student name ") 
totalMarks = input("Enter total marks for ", name) 
numTests = input ("How many tests has this student taken? ") 
averageMark = round(totalMarks/numTests) 
print ("Average mark = ",averageMark) 
next n 


The test plan will look something like this: 


Number of students = 4 for i ; 
Normal data, integer 
tests 1-4 resent 
Jo: total marks 27, tests 3 
y) Tom: total marie 31, tests 4 | Normal data, non-integer 
result rounded up | 


Normal data, result | 
3 Beth: total marks 28, tests 3 ormal cata, resu 
rounded down | 
| 
Amina: total marks 0, tests 0 No tests taken a Program 
crashes 
— P 
5 Number of students abc Test invalid data anche 
terminates 


You can probably think of some other input data that would make the program crash. For example, 
what if the user enters 31.5 for the total marks? The program should validate all user input, so some 
amendments will have to be made to the program before general release! 
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Q3: Devise a test plan for the following program. 
function code(message, shift) 
message = lowercase (message) 
codedMessage = "" 
for x in message 
if x in "“abcdefghijklmnopgqrstuvwxyz" 
num = ord(x) # convert to ASCII value 
num = num + shift 
if num > ord("z") # wrap if necessary 
num = num - 26 
endif 
char = chr(num) # convert back to character 
codedMessage = codedMessage + char 
else 
codedMessage = codedMessage + x 
endif 
next x 
return codedMessage 
endfunction 


# main program 

shift = 3 

msg = input ("Enter your message: ") 
codedMessage = code(msg, shift) 

print ("The encoded message is: ", codedMessage) 


Q4: What will be the output from the algorithm if the user inputs “Hi, Jo!”? 
Explain briefly the purpose of the algorithm. 


Dry-running a program 
A useful technique to locate an error in a program is to perform a dry run, with the aid of a trace table. 
As you follow through the logic of the program in the same sequence as the computer does, you note 
down in the trace table when each variable changes and what its value is. 
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Exercises 
1. A Python program is run in an IDE and gives an incorrect result in the output pane, as shown: 


i Gef Searchiistis, ©): a 
2 found = False 

3 n=0 

4 while found == False and s < lenis): 

s if t == s[{nj: 

6 found = True 

7 else: 

8 Be=eneii 

3 return found ki 
10 

ii s = ("6", ~3*", “3°, “77°, "16", "19", "35") 

i2 & = input ("Please enter string to search for: ") 

i3 found = Searchlist 

i4 aft found: 

is print ("String is in the list*) 

16 else: 

17 print ("String is net in che list") i 


Please enter string to search for: 81 
String is in the list 


Lot CPIIS2 ¢ Led Cok 


(a) State what the expected output is. 
(b) State two facilities an IDE might provide to help you find the error. 
(c) Give the line number of the line that is causing the problem and write the correct statement. 


(d} Apart from debugging aids, identify three features of an IDE that you might use when 
developing a program. 


2. Complete the trace table below to show how each variable changes when the algorithm is 
performed on the test data given. 
x = 0 
y = 0 
z= 0 
w = input 
repeat 
Xx =x +w 
yeyrti 
w = input 
until w < 0 
z= x/¥ 
print z 


Test data: 5 7 22 4 - 
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Chapter 58 — Use of object-oriented techniques 


Objectives 


@ + Be familiar with the basic concepts of object-oriented programming, such as classes, objects, 
methods, attrinutes, inheritance, encapsulation and polymorphism 


Procedural programming 


Programming languages have been evolving ever since the development of assembly languages. 

High level languages such as Basic and Pascal are known as procedural languages, and a program 
written in one of these languages is written using a series of step-by-step instructions on how to solve 
the problem. This is usually broken down into a number of smaller modules, and the program then 
consists of a series of calls to procedures or functions, each of which may in turn call other procedures 
or functions. 


In this method of programming, the data is held in separate primitive variables such as integer or char, 

or in data structures such as array, list or string. The data may be accessible by all procedures in the 
program (global variables) or local to a particular subroutine. Changes made to global data may affect 
other parts of the program, either intentionally or unintentionally, and may mean other subroutines have to 


be modified. 
A-Level only 


In object-oriented programming, the world is viewed as a collection of objects. An object might be a 
person, animal, place or event, for example. It could be something more abstract like a bank account or 
a data structure such as a stack or queue that the programmer wishes to implement. 


Object-oriented programming 


11-58 


An object-oriented program is composed of a number of interacting objects, each of which is responsible 
for its own data and the operations on that data. Program code in an object-oriented program creates 
the objects and allows the objects to communicate with each other by sending messages and receiving 
answers. All the processing that is carried out in the program is done by objects. 


Object attributes and behaviours 


Each object will have its own attributes. The attributes of a car might include its make, engine size, 
colour, etc. The attributes of a person could include first name, last name, date of birth. 


An object has a state. A radio, for example, may be on or off, tuned to a particular station, set to a 
certain volume. A bank account may have a particular balance, say £54.20 and a credit limit of £300. 


An object has behaviours. These are the actions that can be performed by an object; for example, a cat 
can walk, pounce, catch mice, purr, miaow and so on. 
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- Classes 


A class is a blueprint or template for an object, and it defines the attributes and behaviours (known as 
methods) of objects in that class. An attribute is data that is associated with the class, and a method is 
a functionality of the class - something that it can do, or that can be done with it. 


For example, a stock control system might be used by a bookshop for recording the items that it receives 
into stock from suppliers and sells to customers. The only information that the stock class will hold in 

this simplified system is the stock ID number, stock category (books, stationery, etc.), description, and 
quantity in stock. 


Part of a sample definition of a class named StockItem is defined below. Program coding will vary 
according to the language used. 


// Stock class used to model a simple stock control system, 
// allowing stock to be added and sold. 
class StockItem 
// instance variables (properties/attributes) 
private stockID 
private category 
private description 
private qtyInStock 


//A procedure may take one or more parameters. It does not return a value. 
//R procedure with the name new is a constructor. 
public procedure new(aStockID, aCategory, aDescription, adgty) 
(instructions) 
endprocedure 
public procedure ReceiveStock (integer aQty) 
(instructions) 
endprocedure 
public procedure SellStock (integer aQty) 
(instructions) 
endprocedure 


// A function may take one or more parameters. It returns a value. 
public function GetOtyInSstock 
(instructions) 
endfunction 
endclass 


As a general rule, instance variables or attributes are declared private and most methods public, so that 
other classes may use methods belonging to another class but may not see or change their attributes. 
This principle of information hiding, where a class cannot cirectly access the attrioutes of another class 
when they are declared private, is an important feature of object-oriented programming. 
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A constructor is used to create objects in a class. In this pseudocode, a procedure with the name new 
is a constructor. 


Instantiation (creating an object) 


Once the class and its constructor have been defined, and each of the methods coded, we can start 
creating and naming actual objects. The creation of a new object (an instance of a class) is known as 
instantiation. Multiple instances of a class can be created which each share identical methods and 
attributes, but the values of those attributes will be unique to each instance. 


Suppose we want to create a new stock item called book1. The type of variable to assign to book’ has 
to be stated. This will be the class name, StockItem. The word new is typically used to instantiate 
(create) a new object in the class. 


bookl = new StockItem("PT123","Book", "Computer Science", 35) 


book! Is called a reference type variable, or simply a reference variable. Note that this is a different type 
of variable from stockID or qtyInStock, which are string or integer variables. 


Like primitive variables of tyoe integer, double, char, string, reference variables are named 
memory locations in which you can store information. However, a reference variable does not hold the 
object — it holds a pointer or reference to where the object itself is stored. 


A variable reference diagram shows in graphical form the new StockItem object referenced by the 
variable bock1. In the diagram, reference variables are shown as circles and primitive data types (and 
string variables) are shown as rectangles. 


Stockltem 


book (_) StockID PT123 
(Stockltem) 


Category Book 


Description 
QtylnStock 35 


Sending messages 


Messages can be categorised as either “getter” or “setter” messages. In some languages, “getter” 
messages are written as functions which return an answer, and “setter” messages as procedures 
which change the state of an object. This is reflected in the pseudocode used in this book. 


The state of an object can be examined or changed by sending it a message, for example to get or 
increase the quantity in stock. To get the quantity in stock of book1, for example, you could write: 


quantity = bookl.GetQtyInStock 
To record the sale of three book1 objects, you could write 


ookl.SellStock({3) 
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Q2: In the class definition for Radio (shown in the figure below), add the missing instance variables, 
and a procedure to set volume. 
class Radio 
// instance variables 
private volume 


// insert more instance variables here 


public procedure new(aVolume, aStation, aSwitch) 
volume = aVolume 


station = aStation 
switch = aSwitch 
endprocedure 


public procedure setVolume (aVolume) 
(instructions) 
endprocedure 
endclass 


Q3: Write pseudocode statements to instantiate two new radio objects named robertsRadico and 
philipsRadio. 


Tuned to Radio Suffolk, 
volume 3 


Tuned to Radio 1, 
volume 5 


Off 


A radio modelled as a software object 


Each object belongs to a class, and all the objects in the same class have the same structure and methods 
but they each have their own data. Objects belonging to a class are called instances of the class. 


Encapsulation 


An object encapsulates both its state (the values of its instance variables) and its behaviours or methods. 
All the data and methods of each object are wrapped up into a single entity so that the attributes and 
behaviours of one object cannot affect the way in which another object functions. For example, setting the 
volume of the philipsRadic object to 5 has no effect on any other radic object. 


Encapsulation is a fundamental principle of object-oriented programming and is very powerful. 

It means, for example, that in a large project different programmers can work on different classes and 
not have to worry about how other parts of the system may affect any code they write. They can also 
use methods from other classes without having to know how they work. 


322 


CHAPTER 58 — USE OF OBJECT-ORIENTED TECHNIQUES 


A-Level only 


Related to encapsulation is the concept of information hiding, whereby details of an object's instance 
variables are hidden so that other objects must use messages to interact with that object's state. 


To change the volume of the Roberts radio, for example, a programmer might write: 


robertsRadio.setVolume (5) 


A programmer using the method does not need to know how this is achieved. The documentation of 
each method will specify the number and variable type of any arguments that need to be passed to the 
method, and what value, if any, is returned by the method. The attribute volume cannot be seen or 
changed directly; it can only be changed by sending a message (i.e. invoking the method). 


Inheritance 


Classes can inherit data and behaviour from a parent class in much the same way that children can 
inherit characteristics from their parents. A “child” class in object-oriented program is referred to as a 
subclass, and a “parent” class as a superclass. 


For example, we could draw an inheritance hierarchy for animals that feature in a computer game. Note 
that the inheritance relationship in the corresponding inheritance diagram is shown by an unfilled arrow 
at the “parent” end of the relationship. 


Animal 


Rodent 


Class diagram involving inheritance 


All the animals in the superclass Animal share common attrioutes such as name and position. 
Animals may also have common procedures (methods), such as moveLeft, moveRight. A Cat may 
have an extra attribute size, and an extra method pounce. A Rodent may have an extra method 
gnaw. A Beaver may have an extra method, makeDam. 


When to use inheritance 


There is a simple rule to determine whether inheritance is appropriate in a program, called the “is a” rule, 
which requires an object to have a relationship to another object before it can inherit from the object. 
This rule asks, in effect, “Is object A an object B"? For example, “Is a Cat an Animal?’ “Is aMouse 

a Rodent?” Technically, there is nothing to stop you coding a program in which a man inherits the 
attributes and methods of a mouse, but this is going to cause confusion for users! 
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Coding inherited classes 
Common behaviour can be defined in a superclass and inherited into a subclass. 


The class Animal may be defined like this: 


Class Animal 
private name 
private position 
public procedure new (aName, aPosition) 
name = aName 
position = aPosition 
endprocedure 
public procedure moveLeft (steps) 
position = position - steps 
(etc) 
endprocedure 
public function getPosition 
code for function 
endfunction 
endclass 


To code the class header for Cat, which is a subclass of Animal, in pseudocode we could write 
something like 


Class Cat inherits Animal 


private size 
public procedure new(aName, aSize) 
super.new(aName) 
size = aSize 
endprocedure 
endclass 


' Polymorphism 
Polymorphism refers to a programming language's ability to process objects differently depending on 


their class. For example, all objects in subclasses of Animal can execute the methods moveLeft, 
moveRight, which will cause the animal to move one space left or right. 


We might decide that a cat should move three spaces when a moveLeft or moveRight message is 
received, and a Rodent should move two spaces. We can define different methods within each of the 
classes to implement these moves, but keep the same method name for each class. 


Defining a method with the same name and formal argument types as a method inherited 
from a superclass is called overriding. In the example above, the moveLeft method in 
each of the Cat and Rodent classes overrides the method in the superclass Animal. 
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j Moves 
Animal one space 


Moves 


Moves 
three spaces Rodent 


two spaces 


Exercises 


1. A sports club keeps details of its members. Each member has a unique membership number, first 
name, surname and telephone number recorded. Three classes have been identified: 


Member 
JuniorMember 
SeniorMember 


The classes JuniorMember and SeniorMember are related, by single inheritance, to 
the class Member. 


(a) Draw an inheritance diagram for the given classes. [2] 


(bo) Programs that use objects of the class Member need to create a new member, edit a 
member's details, delete a member's details, and show a member's details. No other form 
of access is to be allowed. 


Complete the definition of the attributes and the procedure new for the Member class. 
Class Member 


private memberNumber 


public procedure new(aMemberNumber, aFirstame, aSurname, aTel) 
memberNumber = aMemberNumber 


endprocedure 
endclass [1] 
(c) In object-oriented programming, what is meant by encapsulation? [1] 
(dq) (i) What is meant by instantiation of an object? [2] 


(i) Write a statement to create a new Member object with membership number A456, 
first name John, surname Bell, telephone number 07981 345987. [2] 
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a) In an object-oriented computer game there is a class called Crawlers. Two subclasses 
of Crawlers are Spiders and Bugs. Draw an inheritance diagram for this. [2] 


(b) For the subclass Spiders suggest: 
(i) one attribute 


(i) one method [2] 


3. (a) In object-oriented programming, what is meant by polymorphism? [2] 


(bo) An object-oriented program stores details of a class Bird and a subclass Seagull, 
defined as follows: 


Class Bird 
public procedure move 
system.print ("Birds can fly") 
endprocedure 
endclass 


Class Seabird inherits Bird 
public procedure move (override) 
system.print ("Seabirds can fly and swim") 
endprocedure 
endclass 
Two new objects are instantiated with the lines: 


a 
14 
O. 
me 
\ 


new Bird({) 
new Seabird() 


oo 
} 
KK 
o. 
Nh 
ll 


(i) What will be printed when the following lines are executed? 


birdl .move 
bird? .mov [2] 


(ii) Explain your answer. [2] 
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Algorithms 


In this section: 


Chapter 59 Analysis and design of algorithms 
Chapter 60 Searching algorithms 

Chapter 61 Bubble sort and insertion sort 
Chapter 62 Merge sort and quick sort 
Chapter 63 Graph traversal algorithms 


Chapter 64 Optimisation algorithms 
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Chapter 59 — Analysis and design of algorithms 


Objectives 


e Analyse the suitability of different algorithms for a given task and data set 

e Be familiar with measures and methods to determine the efficiency of different algorithms 
e Define constant, linear, polynomial, exponential and logarithmic functions 

e Use Big-O notation to compare the time complexity of algorithms 


e Be able to derive the time complexity of an algorithm 


Comparing algorithms 


Algorithms may be compared on how much time they need to solve a particular problem. This is referred 
to as the time complexity of the algorithm. The goal is to design algorithms which will run quickly while 
taking up the minimal amount of resources such as memory. 


In order to compare the efficiency of different algorithms in terms of execution time, we need to quantify 
the number of basic operations or steps that the algorithm will need, in terms of the number of items to 
be processed. 


For example, consider these two algorithms, which both calculate the sum of the first n integers. 


function sumIntegersMethod1 (n) 
sum = 0 
for i=ilton 
sum = sum +n 
next i 
return sum 
endfunction 


The second algorithm computes the same sum using a different algorithm: 


function sumIntegersMethod2 (n) 
sum =n * ({nt+1)/2 
return sum 

endfunction 


The first algorithm performs one operation (sum = 0) outside the loop and n operations inside the for 
loop, a total of n + 1 operations. As n increases, the extra operation to initialise sum is insignificant, 

and the larger the value of n, the more inefficient this algorithm is. Its order of magnitude or time 
complexity is basically n. The second algorithm, on the other hand, takes the same amount of time 
whatever the value of n. Its time complexity is a constant. 


We will return to this idea later in the chapter, but first, we need to look at some of the maths involved in 
calculating the time complexity of different algorithms. 


CHAPTER 59 — ANALYSIS AND DESIGN OF ALGORITHMS 


Introduction to functions 


The order of magnitude, or time complexity, of an algorithm can be expressed as a function of its size. 


A function maps one set of values onto another. 


INPUT x 


FUNCTION f: 


OUTPUT f(x) 


A linear function 
A linear function is expressed in general terms as f(x) = ax +c 


Values of the function f(x) = 3x + 4 are shown below for x = 1, 10, 100, 10,000 


Notice that the constant term has proportionally less and less effect on the value of the function as 
the value of x increases. The only term that is significant is 3x, and f(x} increases in a straight line 
as X increases. 


A polynomial function 
A polynomial expression is expressed as f(x) = ax™ + bx +c 


Values of the function f(x) = 2x* + 10x + 50 are shown below for x = 1, 10, 100, 10,000 


10,000 20,000 1,000 21,050 
10,000 | 100,000,000 | 200,000,000 | 100,000 200,100,050 


The values of b and c have a smaller and smaller effect on the answer as x increases, compared with 
the value of a. The only term that really matters is the term in x’, if we are approximating the value of the 
function for a large value of x. 


An exponential function 
An exponential function takes the form f(x) = ab*. This function grows very large, very quickly! 
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logarithmic function 
A logarithmic function takes the form f(x) = a log, x 
“The logarithm of a number is the power that the base must be raised to make it equal to the number.” 
Values of the function f(x) = loge x are shown below for x = 1, 8, 1,024, 1,048,576. 


Permutations 


The permutation of a set of objects is the number of ways of arranging the objects. For example, if you 
have 3 objects A, B and C you can choose any of A, B or C to be the first object. You then have two 
choices for the second object, making 3 x 2 = 6 different ways of arranging the first two objects, and 
then just one way of placing the third object. The six permutations are ABC, ACB, BAC, BCA, CAB, CBA. 


The formula for calculating the number of permutations of four objects is 4 x 3 x 2 x 1, written 4! and 
spoken as “four factorial”. (Note that 10! = 3.6 million... so don’t try getting 10 students to line up in all 
possible ways!) 


Big-O notation 


Now that we have got all the maths out of the way and hopefully understood, we can study the so-called 
Big-O notation which is used to express the time complexity, or performance, of an algorithm. (‘O' 
stands for 'Order’.) 


The best way to understand this notation is to look at some examples. 


O(1) (Constant time) 


O/1) describes an algorithm that takes constant time (the same amount of time) to execute regardless 
of the size of the inout data set. 


Suppose array a has n items. The statement 
length = len(a) 
will take the same amount of time to execute however many items are held in the array. 
O(n) (linear time) 
Ojn) describes an algorithm whose performance will grow in linear time, in direct proportion to the size 


of the data set. For example, a linear search of an array of 1000 unsorted items will take 1000 times 
longer than searching an array of 1 item. 


O(n?) (Polynomial time) 
Ojn*) describes an algorithm whose performance is directly proportional to the square of the size of the 
data set. A program with two nested loops each performed n times will typically have an order of time 
complexity O(n*). The running time of the algorithm grows in polynomial time. 
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O(2") (Exponential time) 


O(2") describes an algorithm where the time taken to execute will double with every additional itern added 
to the data set. Tne execution time grows in exponential time and quickly becomes very large. 


O(log n) (Logarithmic time) 
The time taken to execute an algorithm of order O(log n) (logarithmic time) will grow very slowly as the 


size of the data set increases. A binary search is a good example of an algorithm of time complexity 
O(logsn). Doubling the size of the data set has very little effect on the time the algorithm takes to complete. 


Time 


logn 
n 


Graphs of log n,n, n®, 2" 


Calculating the time complexity of an algorithm 


Here are two different algorithms for finding the smallest element in an array called arrayxX of size n. 
Assume the index starts at 1. 


The first algorithm puts the first value in the array equal to a variable called minimum. It then compares 
each subsequent item in the array to the first item, and if it is smaller, replaces minimum with the new 
lowest value. 


minimum = arrayX[Q0] 
for k =1ton- 1 
if arrayX[k] < minimum then 
minimum = arrayX[k] 
endif 
next k 


331 


SECTION 12 — ALGORITHMS 


A-Level only 
To calculate the time complexity of the algorithm in Big-O notation, we need to count the number of 


basic operations it performs. There is one initial statement, and n if statements, so the time complexity 
is 1 + n. However, as we have already discussed, the 1 is insignificant compared to n and this algorithm 
therefore executes in linear time and has time complexity O(n). 


The second algorithm compares each value in the array to all the other values of the array, and if the 
current value is less than or equal to all the other values in the array then it is the minimum. 


for k =1 to n-l 
isMinimum = True 
for j = 1 té n-1 
if arrayX[k) > arrayX[j] then 
isMinimum = false 


endif 
next j 
if ({isMinimum) then 
minimum = arrayX[k] 
endif 
next k 


To calculate the time complexity of this algorithm, we count the number of basic operations it performs. 


There are two basic operations in the outer loop, (isMinimum = true and the final if statement) 
which are each performed n times. The inner loop has one basic operations performed n* times. 


This gives us a time complexity of 2n + n?, but as discussed earlier, the only significant term is the one 
inn?. The time complexity is therefore O(n’). 
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1. Assuming a is an array of n elements, compute the time complexity of the following algorithm. 
Explain how you arrive at your answer. 


duplicate = False 
for i=0Oton-2 


for j =i+il1lton-l 
if afi] = a[j] then duplicate = True 
next 4j 
next i [3] 


2. (a) Complete the following table showing values of f(n): 


[4] 
eh a a a aT 
fnyent [| + | | | __479,001,600 
(6) Place the following algorithms in order of time complexity, with the most efficient 
algorithm first. [2] 
Algorithm A of time complexity O(n) 
Algorithm B of time complexity O(2") 
Algorithm C of time complexity Oflog n) 
Algorithm D of time complexity O(n*) 
Algorithm E of time complexity O(n!) 
(c) Explain why algorithms with time complexity O(n!) are generally considered not to be helpful 
in solving a problem. Under what circumstances would such an algorithm be considered? [3] 
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Objectives 


¢ Write and trace algorithms for linear search and binary search 
A) e Analyse the time complexity of the linear search and binary search algorithms 


A) e Describe and trace the binary tree search algorithm 


Linear search 


Sometimes it is necessary to search for items in a file, or in an array in memory. If the items are not in any 
particular sequence, the data items have to be searched one by one until the required one is found or the 
end of the list is reached. This is called a linear search. 


The following algorithm for a linear search of a list or array alist {indexed from 0) returns the index of 
itemSought if it is found, -1 otherwise. 


function linearSearch(alist,itemSought) 
index = -l 
i=0 
found = False 
while i < length(alist) and found = False 
if alist[i] = itemSought then 
index =i 
found = True 
endif 
i=i+t1 
endwhile 


return index 


endfuncticn 


A-Level only 
: Time complexity of linear search 
: We can determine the algorithm's efficiency in terms of execution time, expressed in Big-O notation. 
To do this, you need to compute the number of operations that the algorithm will require for n items. 
The loop is performed n times for a list of length n, and there are two steps in the loop (an IF statement 
and an assignment statement), giving a total of 3 + 2n steps {including 3 steps at the start). The constant 
term and the coefficient of n become insignificant as n increases in size, and the time complexity of the 
algorithm basically depends on how often the loop has to be performed in the worst-case scenario. 


Therefore, the time complexity of the linear search is O(n). 
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Binary search 


The binary search is a much more efficient method of searching a list for an item than a linear search, but 
crucially, the items in the list must be sorted. If they are not sorted, a linear search is the only option. 


The algorithm works by repeatedly dividing in half the portion of the data list that could contain the 
required data item. This is continued until there is only one item in the list. 


Consider the following ordered list where we wish to search for data item 50. 


18] 24 | 28| se | 97 | 40] 42] 48 | 40 | 0 | 00 | 04 | 77 | 01 | oo | 


Stage 1: middle term is 43; we can therefore discard all data items less than or equal to 43. Note that 
the middle item of an even number of items is obtained by rounding down; the middle item of 16 items is 


itam 8, 
48 | 50 | 60 | 64] 77 | 81 | 90 | 98 


Stage 2: middle term is 64, so we can discard all data items greater than or equal to 64. 


Stage 3: middle term is 50 — so we have found the data item. 


Q2: Suppose we have the following sorted list: 


3s | o | o [1s] v2] 14 | 16 | 17] 10 


Which one of the following is the correct sequence of comparisons when used to locate the 


data iter 8? 
iy 126, & 
(i) 11,5, 6,8 


Q3: Ask a friend to think of a number between 1 and 1000. Then use a binary search algorithm to 
guess the number. How many different guesses will you need, at most? 


Q4: Look at the following data list. Which items will you examine in (a) a linear search and 
(b) a binary search to find the following data items? 


(i) 27 
(i) 11 
(iii) 60 
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Binary search algorithm 
Below is an algorithm for the binary search on an array of n items in an array aList. 


The ordered array is divided into three parts; a middle item, the first part of the array starting at aList(O] 
up to the middle item and the second part starting after the middle item and ending with the final item in 
the list. The middle item is examined to see if it is equal to the sought item. 

If it is not, then if the middle item is greater than the sought item, the second half of the array is of no 


further interest. The number of items being searched is therefore halved and the process repeated until 
the last item is examined, with either the first or second half of the array of items being eliminated at 


each pass. 

first, last and midpoint are integer variables used to index elements of the array. The variable 
first will start at 0, the beginning of the array. The variable last starts at len(aList) - 1, the last 
array index. 


function binarySearch(aList, itemSought) 
found = False 


index = -1 

first = 0 

last = len(aList)-1 

while first <= last and found = False 
midpoint = Integer part of ((first + last) /2) 
if aList [midpoint] = itemSoucht then 


found = True 
index = midpoint 
else 
if aList[midpoint] < itemSoucht then 
first = midpoint + 1 
else 
last = midpoint - 1 
endif 
endif 
endwhile 
return index #index = -1 if key not found 
endfunction 
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Time complexity of binary search 
The binary search halves the search area with each execution of the loop — an excellent example of a 
divide and conquer strategy. If we start with n items, there will be approximately n/2 items left after 
the first comparison, n/4 after 2 comparisons, n/8 after 3 comparisons, and n/2' after i comparisons. 
The number of comparisons needed to end up with a list of just one item is i where n/2' = 1. One further 
comparison would be needed to check if this item is the one being searched for or not. 


Solving this equation for i, ne 2 
Taking the logarithm of each side, logz N=ilogs2 giving i = log. n (since log, 2 = 1) 


Therefore, the binary search is O(log n). 
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The basic concept of the binary search is in fact recursive, and a recursive algorithm is given below. 
The procedure calls itself, eventually “unwinding” when the procedure ends. When recursion is used 
there must always be a condition that if true, causes the program to terminate the recursive procedure, 
or the recursion will continue forever. 


A recursive algorithm 


Once again, first, last and midpoint are integer variables used to index elements of the array, with 
first starting at O and last starting at the upper limit of the array index. 


function binarySearch(aList, itemSought, first, last) 
if last < first then 
return -l 
else 
midpoint = integer part of (first + last) / 2 
if aList[{midpoint] > itemSought then 
// itemSought is in first half of list 
return binarySearch(aList, itemSought, first, midpoint-1) 
else 
if aList[midpoint] < itemSought then 
// itemSought is in second half of list 
return binarySearch(aList, itemSought, midpointt+l, last) 
else 
// itemSought has been found 
return midpoint 
endif 
endif 
endif 


endfunction 
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. Binary tree search 


The recursive algorithm for searching a binary tree is similar to the binary search algorithm above, except 
that instead of looking at the midpoint of a list, or a subset of the list, on each pass, half of the tree or 
subtree is eliminated each time its root is examined. 


In the tree below, a maximum of four nodes has to be examined to find a value or return “not found”, 


function binarySearchTree (itemSought, currentNode) 
if currentNode = None then 
return False 
else 
if itemSought = item at currentNode then 
return True 
else 
if itemSought < item at currentNode then 
if left child exists then 
return binarySearchTree (itemSought, left child) 
else 
return False 
endif 
if right child exists then 
return binarySearchTree(itemSought, right child) 
else 
return False 
endif 


endif 
endif 
endif 
endfunction 


Time complexity of binary tree search 


Like the binary search, the number of items to be searched is halved with each pass through 
the algorithm. The time complexity is the same as the binary search, i.e. O(log n)}. 


0 
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Exercises 


1. (a) Data structures may be described as static or dynamic. 
(i) State the meaning of the term static. 
(ii) State one type of data structure that is always considered static. 
(ili) State the meaning of the term dynamic. 


(iv) Give one disadvantage of using a dynamic data structure. [4] 


Ss 


The list of positive even numbers up to and including 1000 is 
2,4,6,... 500, 502, ... 998, 1000 
An attempt is to be made to find the number 607 in this list. 


Use the values given to show the first three stages for: 


(i) a binary search [3] 
(ii) a serial search [3] 
(iii) Explain the difference between binary searching and serial searching. [2] 


(iv) State one advantage and one disadvantage of a binary search compared with 
a serial search. [2] 


OCR F453/01 Qu 5 June 2074 


2. The binary search method can be used to search for an item in an ordered list. 
(a) A list in alphabetical order contains 150 names. 


What is the maximum number of names that would need to be accessed to determine if a 
particular name appears in the list’? [1] 


A-Level only 
(6) Which of the following is the order of time complexity of the binary search method? 


Olloge n) On) O(n’) ["] 
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Chapter 61 — Bubble sort and insertion sort 


Objectives 


*« Be able to describe the bubble sort and insertion sort algorithms 


¢ Be able to trace the bubble sort and insertion sort algorithms 


Sorting algorithms 


Sorting is a very common task in data processing, and frequently the number of items may be huge, so 
using a good algorithm can considerably reduce the time spent on the task. There are many different 
sorting algorithms and we will start by looking at a simple but inefficient example. 


Bubble sort 


The Bubble sort is one of the most basic sorting algorithms and the simplest to understand. The basic 
idea is to bubble up the largest (or smallest) item to the end of the list, then the second largest, then the 
third largest and so on until no more swaps are needed. 


Suppose you have an array of n items: 


¢ Go through the array, comparing each item with the one next to it. If it is greater, swap them. 
* The last element of the array will be in the correct place after the first pass 


e Repeat n-2 times, reducing by one on each pass the number of elements to be examined 


12-61 
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Example 3 Working through the Bubble sort algorithm 


The figure below shows how the items change order in the first pass, as the largest item ‘bubbles’ to the 
end of the list. Each time an item is larger than the next one, they change places. 


pase Me + |] ate ln 
BDO : 


After the first pass, the largest item is in the correct place at the end of the list. On the second pass, only 
the first six numbers are checked. 


re Boog « 


11 and 15 in the correct place; so only the first five numbers are checked. 


re pao: « 


9, 11 and 15 in the correct place; so only the first four numbers are checked. 


Pass 4 3] 4] s | 8 9 11 15 


8, 9, 11 and 15 in the correct place; so only the first three numbers are checked. 


Pass 5 3 | 4 8B) 8 '}@) 41 (| 


Finally, the first two numbers are checked and swapped 


Pass 6 


Notice that in this case, no numbers were swapped on Pass 5. Therefore Pass 6 was not necessary. 

In order to avoid performing unnecessary passes on a list that is already in sequence, a flag may be set 
and tested on each pass so that if no swaps are made, no more unnecessary passes are made through 
an already sorted list. This is shown in Example 2 on the next page. 
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Example 1 


Write a pseudocode algorithm for a bubble sort to sort the numbers 9, 5, 4, 15, 3, 8, 11 into ascending 
sequence. Print the numbers after each of the 6 passes through the list. 


numbers = [9, 5, 4, 15, 3, 8, 11] 
numItems = len(numbers) // get number of items in the array 
for i= 0 to numItems - 2 


for j = 0 to(numItems - i - 2) 
if numbers [j] > numbers[j + 1] 
// Swap the names in the array 
temp = numbers[j] 


numbers[j] = numbers[j + 1] 
numbers[j + 1] = temp 
endif 
next J 


print (numbers) 
next i 


If you run this program, the output is 
[5, 4, 9, 3, 8, 11, 15] 
(4, 5, 3, 8, 9, 11, 15] 
[4, 3, 5, 8, 9, 11, 15] 
(3, 4, 5, 8, 9, 11, 15] 
(3, 4, 5, 8, 9, 11, 15] 
[3, 4, 5, 8, 9, 11, 15] 


The last pass through the list was not necessary. 


Example 2 
Amend the algorithm so that no unnecessary passes are made though the list. 
numbers = [9, 5, 4, 15, 3, 8, 11] 
numItems = len (numbers) // get number of items in the array 
flag = True // indicates when a swap is made 
while i < (numItems - 1) and (flag = True) 
flag = False 


for j = 0 to numItems - i - 2 

if numbers [j] > numbers[j + 1] 
// Swap the names in the array 
temp = numbers([j] 
numbers[j] = numbers[j + 1] 
numbers[j + 1] = temp 
flag = True 

endif 


endwhile 
print (numbers) 
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Insertion Sort 


This is a sorting algorithm that sorts one data item at a time. It is rather similar Jo —-. 
to how you might sort a hand of cards. The algorithm takes one data item wh | 
from the list and places it in the correct location in the list. This process is 4 * 


repeated until there are no more unsorted data items in the list. Although more X 
efficient than the bubble sort, it is not as efficient as the merge sort or quick sort. 


Example 4 Insertion sort 
The sare list of numbers is sorted into ascending order using an insertion sort: 


9,5, 4, 15, 3, 8, 11 


We leave the first item at the start of 


the list 

5 is now inserted into the sorted list ist pass 
4 is now inserted into the sorted list 2nd pass 
15 is now inserted into the sorted list 

a 3rd pass 
(it stays where it is) 

3 is now inserted into the sorted list 4th pass 
8 is now inserted into the sorted list 5th pass 
11 is now inserted into the sorted list 6th pass 


On each pass, the current data item is checked against those already in the sorted list (shaded in the 
diagram). If the data iter being compared in the sorted list is larger than the current data item, it is now 
shifted to the right. This continues to happen until we reach a data item in the sorted list which is smaller 
than the current data item. 


For example, at the 5th pass 8 is compared with 15, and since it is smaller, 15 is shifted right. 
8 is compared with 9, and 9 is shifted right. 


8 is compared with 5, and as it is larger, it is inserted into the free space. 


sth pass in summary: 


8 is removed from the list temporarily 


Since 15 > 8, it is shifted to the right 


Since 9 > 8, it is shifted to the right 


Since 5 < 8, 8 is inserted back into the 
sorted list 
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Algorithm for insertion sort 
Here is the algorithm for a procedure to do an insertion sort : 


procedure insertionSort (alist) 
mn = len(alist) 
for index = 1ton- 1 
currentvalue = alist [index] 


position = index 

while position > 0 and alist[position - 1] > currentvalue 
alist[position] = alist[position - 1] 
position = position - l 

endwhile 

alist[position] = currentvalue 

next index 
endprocedure 


// main program 

alist = [9,5,4,15,3,8,11] 
insertionSort (alist) 

print ("serted list ", alist) 


A-Level only 


Time complexity of bubble and insertion sorts 


The bubble sort requires close to n passes through the list, with each pass requiring a maximum of n - 1 


swaps. It is of order O(n*) 


The insertion sort also has two nested loops and so has time complexity O(n*). However, if the list is 
QO already almost sorted, the time complexity is reduced to close to Ojn). 
Exercises 


1. (a) A bubble sort is performed on the following list: 
3, 5, 8, 17, 12, 15, 18, 23, 1 
(i) Describe how a bubble sort works. [3] 
(ii) What is the sequence of the list after the first pass is completed? [1] 


(iii) How many passes through the list will be required to sort the items into ascending 
numerical sequence? [1] 


{b) An insertion sort is performed on the same list as in part (a). 


(i) Describe how an insertion sort works, [3] 
(ii) What is the sequence of the list after the first pass is completed? [1] 
(ii) What is the average time complexity of the insertion sort? [1] 
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Chapter 62 — Merge sort and quick sort 


Objectives 


fA) e Understand and be able to trace the merge sort and quick sort algorithms 


Merge sort 


The merge sort uses a divide and conquer approach. The list is successively divided in half, forming 
two sublists, until each sublist is of length one. The sublists are then sorted and merged into larger 
sublists until they are recombined into a single sorted list. The basic steps are: 


« Divide the unsorted list into n sublists, each containing one element 


« Repeatedly merge sublists to produce new sorted sublists until there is only one sublist remaining. 
This is the sorted list. 


The merge process is shown graphically below for a list in the initial sequence 53279138. 


Initial sequence ah ouihchcameteeact Into sublists each of langth 1 


Verge 


[2;a]s]7 Li] 3jels) 
Merge 


Lifets[s[s[7]{e[s| 


Final merged and sortec list 


The list is first split into sublists each containing one element. 


The merge process merges each pair of sublists into the correct sequence. Taking for example two 
lists: leftlist = [2,3] andrightlist = [1,3], the merge process works like this: 


1. Compare the first item in left1ist with the first element in rightlist 


2. If item in leftlist <item in rightlist, add item from leftlist tomergedlist and read 
the next item from leftlist 


3. Otherwise, add item from rightlist to mergedlist and read the next item from 
rightlist 

4.Once one list is empty, any remaining items are copied into the merged list 

5, Repeat from Step 2 until all items are in mergedlist 

The process is then repeated for each pair of sublists until the lists are merged into the final sorted list. 
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An algorithm for the merge sort is given below. 


procedure mergesort (mergelist) 
i= len(mergelist) > 1 then 


mid = len(mergelist) div 2 // performs integer division 

lefthalf = mergelist[:mid] // left half of mergelist into 
lefthalf 

righthalf = mergelist [mid:] // vight half of mergelist into 
righthalf 


mergesort (lefthalf) 
mergesort (righthalf) 


i= 0 
ode’ 
k= 0 


while i < len(lefthalf) and j < len(righthalf) 
if lefthalf[(i] < righthalf(j] then 
mergelist[k]) = lefthalf[i] 
i=i¢+i 
else 
mergelist[k]) = righthalf[j] 
fey 
endif 
k=k+i1 
endwhile 
while i < len(lefthalf) // check if left half has 
elements not merged 
meroelist[k] = lefthalf[i] // if so, add to mergelist 
i=-=i+l 
k=k+i1 
endwhile 
while j < len(righthalf) // check if rt half has elements 
not merced 
mergelist[k] = righthalf[j] // if so, add to mergelist 


endwhile 
endif 
endprocedure 
ff #eeeee main program *teteee 
mergelist = [5, 3, 2, 7, 9, 1, 3, 8] 
mergesort (mergelist) 
print (mergelist) 
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Time complexity of merge sort 


The merge sort is another example of a divide and conquer algorithm, but in this case, there are n 
sublists to be merged, so the time complexity has to be multiplied by a factor of n. 


The time complexity is therefore O(nlog n). 


Space complexity 


The amount of resources such as memory that an algorithm requires, known as the space complexity, 
is also a consideration when comparing the efficiency of algorithms. The bubble sort, for example, 
requires n memory locations for a list of size n. The merge sort, on the other hand, requires additional 
memory to hold the left half and right half of the list, so takes twice the amount of memory space. 


Quick sort 


The quick sort algorithm, like the insertion sort, uses a Divide and Conquer algorithm to quickly reduce 
the size of the problem, but without using the additional storage required by the merge sort. 


The steps in the quick sort are as follows: 


1. Select a value called the pivot value. There are different ways to choose the pivot value but we will 
choose the first item in the list. The actual position where the pivot value belongs in the final sorted 
list, called the split point, will be used to divide the list for subsequent calls. In the list shown below, 
9 is the first pivot value. 


2. Divide the remainder of the list into two partitions 


* all elements less than the pivot value must be in the first partition 


« all elements greater than the pivot value must be in the second partition 


(The order of the elements in each partition is not significant in this explanation. It will become clearer 
in the explanation of the detailed procedure.) 
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3. 3and 15 are now the pivots in the left and right partitions. Recursively repeat the process. 


The list is now in sequence. 


The detailed procedure 
Below is a more detailed description of how the pivots are found at each point. 
Locate two position markers called leftmark and rightmark at the beginning and end of the remaining 


items in the list (positions 1 and 6 in the figure). The goal of the partitioning process is to move items that 
are on the wrong side of the pivot value while also converging on the split point. 


5 < 9 so move leftmark to right. 
4<9s0 move right. 15 > 9 so stop. 


leftmark —e «— rightmark 


fa| 11 >9s0 move rightmark to left. 
9 11 
8<9so stop. 


leftmark rightmark 


9|s5|4/15| 3/8 |i Exchange 15 and 8, and continue 
l moving leftmark and rightmark 


leftmark rightmark 


f9|5|4|8/|s|15| 11 8 < 9 so move leftmark to right. 
3<9 somovetoright. 15>9 so stop. 


leftmark rightmark 


15 > 9 so move rightmark left 


leftmark rightmark 


Rightmark and leftmark have now crossed over, so we stop. The position of rightmark is now the split 
point. The pivot value is exchanged with the contents of the split point and the pivot value is now in 


place. 
3{s/4 le Bs) 


All the items to the left of the split point are less than the pivot value, and all the items to the right of the 
split point are greater than the pivot value. The list can now be Givided at the split point and the quick 
sort invoked recursively on the two halves. 


quicksort guicksort 
left half righthalf 
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The quick sort algorithm 


The quick sort algorithm shown below is recursive, repeatedly dividing the list at the split point until each 
half is of length 1, at which point the list is sorted. Most of the work is done in the partition function, 
which finds the split point. 


function partition(alist, start, end) 
pivot = alist[start] 
leftmark = start + 1 
rightmark = end 
done = False 
while done = False 
while leftmark <= rightmark and alist[leftmark] <= pivot 
leftmark = leftmark + 1 
endwhile 
while alist[rightmark] >= pivot and rightmark >= leftmark 
rightmark = rightmark - 1 
endwhile 
if rightmark < leftmark 
done = True 
else 
// swap the list items 
temp = alist [leftmark] 
alist[leftmark] = alist[rightmark] 
alist[rightmark] = temp 
endif 


// swap the pivot with alist [rightmark] 
temp = alist[start] 
alist[(start] = alist[rightmark] 
alist{rightmark] = temp 
return rightmark 

endfunction 


function quicksort(alist, start, end) 
if start < end 
// partition the list 
split = partition(alist, start, end) 
// sort both halves 
quicksort(alist, start, split-1) 
quicksort(alist, splitt+l, end) 
endif 
return alist 
endfunction 


alist = [9, 5, 4, 15, 3, 8, 11] 
sortedList = quicksort (alist,0,len(alist)-1) 
print (sortedList) 
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Advantages and disadvantages of the quick sort algorithm 


The quicksort algorithm is extremely fast. If the partition always occurs in the middle of the list, there will 
be log n divisions in a list of length n, and each of the n items needs to be checked against the pivot 
value to find the split point. It therefore has time complexity O(n log n). 


Another aclvantage is that it does not need additional memory, like the merge sort. 


A disadvantage is that if the split points are not near the middle of the list, but are close to the start 

or end of the list, the division will be very uneven. If the split point is, for example, the first item in the 
sequenced list, the division results in a list of O items and a list of n-1 items. The list of n-1 iterns divides 
into 0 iterns and n-2 items and so on. The resulting time complexity is O(n’). 


If the list is very large, and recursion continues too long, it may cause stack overflow and the program 
will crash. 


Summary of sort algorithms 


¢ Bubble sort is the slowest of the sorts, with time complexity O(n") 
e Insertion sort is O(n?) but if the list is already almost sorted, this reduces to O(n) 
¢ Merge sort is O(n log n) but requires additional memory space for the merging process 


¢ Quick sort is generally the fastest sort, but is dependent on using a pivot that is not close to the 
smallest or largest elements of the list. There are several methods for selecting a pivot to ensure this 
does not happen. It has average time complexity O(n log n). It does not require additional memory 
space, 


Exercises 


1. (a) There are many methods of sorting a set of records into ascending order of key. 
What factors would you consider in deciding which of these methods is the most suitable 
for a particular application? [2] 


(b} The merge sort algorithm has time complexity O(n log n). For a list of 1,024 items in 
random sequence, is this algorithm more or less efficient than a sort algorithm of time 
complexity O(n?}? Explain your answer. [3] 


2. Explain briefly the steps in 
(a) the merge sort algorithm [4] 


(b) the quick sort algorithm [4] 
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Chapter 63 — Graph-traversal algorithms 


Objectives 


fA} ¢ Be able to trace depth-first and breadth-first algorithms 
@ + Describe typical applications of each 


Graph traversals 


There are two ways to traverse a graph so that every node is visited. Each of them uses a supporting 
data structure to keep track of which nodes have been visited, and which node to visit next. 


¢ A depth-first traversal uses a stack, which is implemented autornatically during execution of a 
recursive routine to hold local variables, parameters and return addresses each time a subroutine is 
called. Alternatively, a non-recursive routine could be written and the stack maintained as part of 
the routine. 


¢ <A breadth-first traversal uses a queue. 


Depth-first traversal 
In this traversal, we go as far down one route as we can before backtracking and taking the next route. 


The following recursive subroutine d£s is called initially from the main program, which passes it a graph, 
defined here as an adjacency list (see Chapter 38) and implemented as a dictionary with nodes A, B, 
C, ... aS keys, and neighbours of each node as data. Thus if "A" is the current vertex, graph ("A") will 
return the list ["B", "D","E"] with reference to the algorithm below and the graph overleaf. 


The calling program also passes an empty list of visited nodes and a starting vertex. 


Check the graph in Step 1 on the next page to verify that it corresponds to the nodes and their 
neighbours. There are different ways of drawing the graph but logically they should all be equivalent! 


GRAPH = { maANs ("B" F a f "EN P "wen : aa shia nom r " dD") F mets ["B", sl ta | ' 
mer are i (PA MB VE CER] 5 ll ee ee w Wet. HP Eh P dl "Oo" } 
visitedList = [] // an empty list of visited nodes 


function dfs(graph, currentVertex, visited) 
append currentVertex to list of visited nodes 
for vertex in graph[currentVertex] // check neighbours 
if vertex not in visited then 


dfs(graph, vertex, visited) // recursive call 
// stack will store return address, parameters and local variables 
endif 


next vertex 
return visited 
endfunction 


#main program 
traversal = dfs(GRAPH, "A", visitedList) 
print ("Nodes visited in this order: ", traversal) 
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ltis easiest to understand how this works by looking at the graphs below. This shows the state of the 


stack (here it just shows the current node when a recursive call is made), and the contents of the visited 
list. Each visited node is coloured dark blue. 


: Ro re Visited 
: (E) ) Stack 


1. Start the routine with an empty stack and an empty list 2. Visit A, add it to the visited list. Colour it to show it has 


of visited nodes, been visited. 


Stack 


Visited 
B 
A A 
Stack Stack 


3. Push A onto the stack to keep track of where we have 4, Push B onto the stack and from B, visit the next 
come from and visit A’s first neighbour, B. Add it to the unvisited node, C. Add it to the visited list. Colour it to 


visited list. Colour it to show it has been visited. show it has been visited. 
ABCG ABCG 
Visited Cc Visited 
B B 
A A 
Stack Stack 


12-63 


5. Push C onto the stack and from C, visit the next 
unvisited node, G. Add it to the visited list. Colour it to 


show It has been visited. 
jasce CG . 


6. At G, there are no unvisited nodes so we backtrack, 
Pop the previous node C off the stack and return to C 


ABCGD 


Visited Visited 
cy 
A A 
Stack Stack 
7. At C, all adjacent nodes have been visited, so 8. Push B back onto the stack to keep track of where we 
backtrack again. Pop B off the stack and return to B. have come from and visit D, Add it to the visited list. 
Colour it to show it has been visited, 
Visited D Visited 
B B 
A A 
Stack Stack 


9. Push D onto the stack and visit E. Add it to the visited 10. From E, A and D have already been visited so pop D 


list. Colour it to show it has been visited. off the stack and return to D. 
ABCGDEF ABCGDEF 
Visited oD Visited 
B 
A 
Stack Stack 


11. Push D back onto the stack and visitF. Add ittothe 12. AtF, there are no unvisited nodes so we pop D, then 
visited list. Colour it to show it has been visited. B, then A, whose neighbours have all been visited. 
The stack is now empty which means every node has 
been visited and the algorithm has completed. 
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Breadth-first traversal 


With a breadth first traversal, starting at A we first visit all the nodes adjacent to A before moving to B and 
repeating the process for each node at this ‘level’, before moving to the next level. Instead of a stack, a queue 
is used to keep track of nodes that we still have to visit. Nodes are coloured pale blue when queued and dark 
blue when dequeued and added to the list of nodes that have been visited. 


Visited Re a4 Visited 
E__J 
Queue ©) Queue 
1. Append A to the empty queue at the start of the 2. Dequeue A and mark it by colouring it dark blue. Add it 
routine. This will be the first visited node. to the visited list. 


a Visited 
ND 
ey Queue 


3. Queue each of A's adjacent nodes B, D and E in 4, We've now finished with A, so dequeue the first itam in 
turn, Colour each node pale blue to show it has been the queue, which is B. Mark it by colouring It dark blue 
queued. and add it to the visited list. 


Visited 


Queue 


6. B’s neighbours are all coloured, so dequeue the first 
item in the queue, which is D. Mark it by colouring it 
dark blue and add it to the visited list. 


ABDE 


Visited 


Queue 


5. Queue B's remaining neighbour C. Colour it pale blue 
to show It has been queued. 


~l 


. D's adjacent node E has already been queued and 8 
coloured. Add D's adjacent node F to the queue. 
Colour it pale blue to show it has been queued. 


ABDEC 


. Dequeue the first item, E. Mark it by colouring it dark 


blue and add it to the visited list. 


Visited Visited 
—_ 
Queue Queue 


9. E's neighbours are all coloured, so dequeue the next 
item, C. Mark it by colouring it dark blue and add it to 


the visited list. 


10. Add C’s adjacent node G to the queue and colour it 
pale blue to show it has been quaued. 


ABDECFG 


Visited Visited 
[ 
Queues Queue 
11. C's neighbours are all coloured now, so dequeue 12. Finally, dequeue G, mark it by colouring it dark blue 
F, mark it by colouring it dark blue and add it to the and add it to the visited list. The queue is now empty 
visited list. and all the nodes have been visited. 
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Note that we need to distinguish between a dequeued vertex that is added to the visited list and whose 
neighbours we are examining, which we colour dark blue, and neighbours of the current vertex, which we 
put in the queue and colour pale blue to show they have been queued but not visited. 


Pseudocode algorithm for breadth-first traversal 


The following algorithm assumes you are starting from a vertex currentVertex. The queue qisa 
dynamic data structure implemented for example as a list. A second list called visitedNodes holds 
the nodes that have been visited. Colours Black, Grey and White are more traditional in this algorithm 
than Dark Blue, Pale Blue and white so are used here — the diagrams are clearer in colour! 


The breadcth-first traversal is an iterative, rather than a recursive routine. The first node (‘A’ in this 
example}, is appended to the empty queue as soon as the subroutine is entered. A Python definition 
of the graph as a dictionary is given below for interest, but is not directly used in the pseudocode, as 
implementations will vary in different languages. 


{ 
{"eoleur": "White", "neighbeure™: ["B", "DO", "H"]}, 
{"eolour": "White", "neighbours™: ["A", "D", "C"]}, 
{"colour": "White", "neighbours": ["B5"“, "G"]], 

"D": {"colour": "White", "neighbours": ["A", "B", "BE", "E"]}, 
{"colour": "White", "neighbours": ["A", "D"]}, 
{"colour": "White", "neighbours": ["D"]}, 

{"colour": "White", "neighbours": ["C"] } 


function bfs(graph, vertex) 
queue = [] // an empty queue 
visited = [] // an empty list of visited nodes 
enqueve vertex 
while queue not empty 
dequeue item and put in currentNode 
set colour of currentNode to "Black" 
append currentNode to visited 
for each neighbour of currentNode 
if colour of neighbour = "White" then 
enqueue neighbour 
set colour of neighbour to "Grey" 
endif 
next neighbour 
endwhile 
return visited 
endfunction 


// main 
visited = bfs(GRAPH, "A") 
print ("List of nodes visited: ", visited) 
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« In scheduling jobs where a series of tasks is to be performed, and certain tasks must be completed 
before the next one begins. 


Applications of depth-first search 


Applications of the depth-first search include the following: 


« Insolving problems such as mazes, which can be represented as a graph 


Finding a way through a maze 


A depth-first search can be used to find a way out of a maze. Junctions where there is a choice of route 
in the maze are represented as nodes on a graph. 


Q1: (a) Redraw the graph without showing the dead ends. 
(b) State the properties of this graph that makes it a tree. 


(c) Complete the table below to show how the graph would be represented using an 
adjacency matrix. 
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- Applications of breadth-first search 


Breadth-first searches are used to solve many real-life problems. For example: 


e A major application of a breadth-first search is to find the shortest path between two points A and 
B, and this will be explained in detail in the next chapter. Finding the shortest path is important in, for 
example, GPS navigation systems and computer networks. 


e Facebook. Each user profile is regarded as a node or vertex in the graph, and two nodes are connected 
if they are each other's friends. 


e Web crawlers. A web crawler can analyse all the sites you can reach by following links randomly on a 
particular website. 


Depth-first tree traversal 


A tree is a special case of a graph, being defined as a connected, undirected graph with no cycles (see 
Chapter 39.) 


Remember that a depth -first traversal of a graph (and therefore, of a tree) goes as far down one path as 
possible, before backing up to the nearest root node and exploring that path as far as it goes. A depth- 
first traversal of this tree visits nodes in the order 


Monkey, Giraffe, Buffalo, Baboon, Cheetah, Hippo, Jackal, Topi, Ostrich, Rhino, Zebra. 


You should have discovered that the nodes are visited in the same order — in other words, a depth-first 
tree traversal is equivalent to a pre-order traversal. 


Although it would be quite possible to do a depth-first tree traversal using the algorithm given above 
using the stack as a “helper” data structure, a much simpler algorithm is given in Chapter 39. 


Breadth-first tree traversal 


A breacth first traversal of the tree visits nodes in the order 


Monkey, Giraffe, Topi, Buffalo, Hippo, Ostrich, Zebra, Baboon, Cheetah, Jackal, Rhino. 


They are not the same! The breadth-first traversal is best done using the algorithm for the breacth-first 
graph traversal, using a queue as the “helper” data structure. 
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Exercises 


1. (a) Name the supporting data structure which is commonly used when traversing a graph 
(i) depth-first 
(ii) breadth-first 
(6) Show the order in which vertices in the following graph are visited, starting at A, using 
(i) depth-first traversal 


(ii) breadth-first traversal 


(c) (i) Explain why the graph above is not a tree. Which edges would need to be removed 
for it to be a tree? 


(ii) Show, by traversing the tree below using a pre-order traversal and writing the nodes 
in the order that they are visited, that a pre-order tree traversal is equivalent to a 
depth-first graph traversal. 


2. List the order in which nodes in the tree below will be visited using 
(a) a breadth-first traversal 


(6) a post-order traversal. 
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Chapter 64 — Optimisation algorithms 


Objectives 


A] e Understand and be able to trace Dijkstra’s shortest path algorithm 
A) e Be aware of applications of shortest path algorithm 
A) e Describe the A* algorithm 


Optimisation problems 


We increasingly rely on computers to find the optimum solution to a range of different problems. 
For example: 


¢ scheduling aeroplanes and staff so that air crews always have the correct minimum rest time 
between flights 


¢ finding the best move in a chess problem 
¢ timetabling classes in schools and colleges 


e finding the shortest path between two points — for building circuit boards, route planning, 
communications networks and many other applications 


Finding the shortest path from A to B has numerous applications in everyday life and in computer-related 
problems. For example, if you visit a site like Google Maps to get directions from your current location to 
a particular destination, you probably want to know the shortest route. The software that finds it for you 
will use representations of street maps or roads as graphs, with estimated driving times or distances as 
edge weights. 


Dijkstra’s shortest path algorithm 


Dijkstra (pronounced dike-stra) lived from 1930 to 2002. He was a Dutch computer scientist who 
received the Turing award in 1972 for fundamental contributions to developing programming languages. 
He wrote a paper in 1968 which was published under the heading “GO TO Statement Considered 
Harmful" and was an advocate of structured programming. 


Dijkstra's algorithm is designed to find the shortest path between one particular start node and all other 
nodes in a weighted graph. This is similar to a breadth first search. 


The weights could represent, for example, distances or time taken to travel between towns, or the cost of 
travel between airports. 
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“SLES 


358 : 


CHAPTER 64 — OPTIMISATION ALGORITHMS 


A-Level only 


Assign a temporary distance value to every node, starting with zero for 
the initial node and infinity for every other node 


The algorithm 


The algorithm works as follows: 


Add all the vertices to a priority queue, sorted by current distance 


(This puts the initial node at the front, the rest in random order.) 
while the queue is not empty 
remove the vertex u from the front of the queue 
for each unvisited neighbour w of the current vertex u 
newDistance = distanceAtU + distanceFromUtow 
if newDistance < distanceAtW then 
distanceAtW = newDistance 
change position of w in priority queue to reflect new 
distance to w 
endif 
next w 
endwhile 


Example 


In the figure below, Ais the start node. A temporary distance value has been assigned to every node, 
starting with zero for the start node and infinity for every other node. 


The priority queue is shown beside the graph, and it is kept in order of vertices with the shortest 
known distance from A. To start with, Ais at the front, and the other nodes are in random order, in this 
case alphabetical. 


The vertices are coloured. 


« White vertices have not been visited and their distances remain at infinity. 


e Pale blue vertices have been partially explored. A tentative distance to them has been found but all 
possible paths to them have not yet been explored, so this distance cannot be guaranteed to be the 
shortest one and they remain in the queue. 


e Dark blue vertices have been removed from the queue and their minimum distance from A has been 
found. These vertices are described as having being visited. 


Start at A, remove it from the front of the queue and shade it dark blue to show it has been visited 


Priority queue 


geo [Gne[D=«|en=] | 


Node A has two neighbours B and D. Shade each of these pale blue to show they have been partially 
explored, and calculate new distance values for nodes B and D by taking the distance value at A (i.e. 
Zero) and adding it to the edge weight between A and B, A and D. 

Since all these values are less than infinity, update the distances at B and D. Distance at D is less than 
distance at B, so move D to the front of the priority queue. 
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Remove D from the front of the queue. Shade it dark blue to show it has been visited. Shade D's 
neighbours C and E pale blue to show they have been partially explored. 


Now calculate new values for the unvisited neighbours of D, namely B, C and E. The distance between D 
and B is 2, and this is added to the edge weight between D and A. 3 + 2 = 5 so the distance value at B 
is changed to the new lowest value, 5. 


The current tentative distance o at C is replaced with 3 + 4 = 7, at Eis replaced with 3 + 7 = 10, 


The order of nodes in the priority queue does not need to be changed since B, the node with the 
smallest current distance from A, is already at the front. 


[B=5{C=7/e=10] | 


Remove B from the priority queue. Shade B dark blue to show it has been visited. 


At B, the values at C and E are calculated as 5 + 3 = 8 and 5+ 6 = 11 respectively, but these are both 
greater than the tentative values already there, so these values are not changed. 


5 
3 ae 
: . 
—€ ; Np [C=7jE=10/ | | | 
3 10 


Remove C from the queue and shade it dark blue to show it has been visited. The distance to E via C will 
be calculated as 7 + 1 = 8. This is less than current tentative distance to E (10) so will replace it. 


7 


0 


a ee ee 


Remove E from the queue. It has no unvisited neighbours, so there are no new distances to calculate. 
Shade E dark blue. 
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The queue is empty, all the nodes have now been visited so the algorithm ends. 


\We have found the shortest distance from A to every other node, and the shortest distance from A is 
marked in blue at each node. 


Q1: Copy the graph below and use the method above to trace the shortest path from A to all other 
nodes. Write the shortest distance at each node. 


Q2: Use a similar method to trace the shortest path from A to all other nodes. Write the shortest 
distance at each node. What is the shortest distance from A to G? 


The A* algorithm 


Dijkstra's algorithm is a special case of a more general path-finding algorithm called the A* algorithm. 
Dijkstra's algorithm has one cost function, which is the real cost value (e.g. distance) from the source 
node to every other node. 


The A’* algorithm has two cost functions: 
1. g(x) - as with Dijkstra's algorithm, this is the real cost from the source to a given node. 


2. h(x) - this is the approximate cost from node x to the goal node. It is a heuristic function, meaning 
that it is a good or adequate solution, but not necessarily the optimum one. This algorithm stipulates 
that the heuristic function should never overestimate the cost, therefore the real cost should be 
greater than or equal h(x). 


The total cost of each node is calculated as f(x) = g(x) + h(x). 


The A’* algorithm focusses only on reaching the goal node, unlike Dijkstra's algorithm which finds the 
lowest cost or shortest path to every node. It is used, for example, in video games to enable characters 
to navigate the world. 
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3 1. (a) What is the purpose of Djikstra’s shortest path algorithm? 
(b} Describe briefly two applications of the algorithm. 
(c) The weighted graph (Figure 1) shows the distances between each of the graph’s vertices. 


Copy Figure 1 and show the tentative distance from the starting node A allocated to each node 
after nodes B and D have been visited (dequeued and finished with) using Dijkstra's algorithm. 


Figure 1 


(d) A possible path from A to G is AP DOFIOG, 
(i) Describe in a similar way, the shortest path from A to C. What is its length? 


(ii) What is the shortest path from A to G? What is its length? 


[2] 
[3] 
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2. The following graph shows distances between five cities. Djikstra's shortest path algorithm is used 
to find the shortest distance between Liverpool and each of the other cities. The algorithm is given 
below. 


Assign a temporary distance value to every node, starting with zero 
for the initial node and infinity for every cther node 


Add all the vertices to a pricrity queue, sorted by current 
distance. (This puts the initial node at the front, the rest, which 
all start with temporary distances of infinity, in random order.) 


while the queve is not empty 
remove the vertex u from the front of the queue 
for each unvisited neighbour w of the current vertex u 
newDistance = distanceAtU + distanceFromUtoW 
if newDistance < distanceAtW then 
distanceAtW = newDistance 
change position of w in priority queue to reflect new 
distance to w 
endif 
next w 
endwhile 
75 
— 
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Liverpool Sheffield 


7 


The following table represents the distances after the first statement in the algorithm is executed. 


(a) Complete the following table after one iteration of the WHILE loop in the above algorithm. [3] 
(6) Complete the table after the second iteration of the WHILE loop. [2] 
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